I am trying to get audio playing with libav using xaudio2. The xaudio2 code I am using works with an older ffmpeg using avcodec_decode_audio2, but that has been deprecated for avcodec_decode_audio4. I have tried following various libav examples, but can't seem to get the audio to play. Video plays fine (or rather it just plays right fast now, as I haven't coded any sync code yet).
Firstly audio gets init, no errors, video gets init, then packet:
while (1) {
//is this packet from the video or audio stream?
if (packet.stream_index == player.v_id) {
add_video_to_queue(&packet);
} else if (packet.stream_index == player.a_id) {
add_sound_to_queue(&packet);
} else {
av_free_packet(&packet);
}
}
Then in add_sound_to_queue:
int add_sound_to_queue(AVPacket * packet) {
AVFrame *decoded_frame = NULL;
int done = AVCODEC_MAX_AUDIO_FRAME_SIZE;
int got_frame = 0;
if (!decoded_frame) {
if (!(decoded_frame = avcodec_alloc_frame())) {
printf("[ADD_SOUND_TO_QUEUE] Out of memory\n");
return -1;
}
} else {
avcodec_get_frame_defaults(decoded_frame);
}
if (avcodec_decode_audio4(player.av_acodecctx, decoded_frame, &got_frame, packet) < 0) {
printf("[ADD_SOUND_TO_QUEUE] Error in decoding audio\n");
av_free_packet(packet);
//continue;
return -1;
}
if (got_frame) {
int data_size;
if (packet->size > done) {
data_size = done;
} else {
data_size = packet->size;
}
BYTE * snd = (BYTE *)malloc( data_size * sizeof(BYTE));
XMemCpy(snd,
AudioBytes,
data_size * sizeof(BYTE)
);
XMemSet(&g_SoundBuffer,0,sizeof(XAUDIO2_BUFFER));
g_SoundBuffer.AudioBytes = data_size;
g_SoundBuffer.pAudioData = snd;
g_SoundBuffer.pContext = (VOID*)snd;
XAUDIO2_VOICE_STATE state;
while( g_pSourceVoice->GetState( &state ), state.BuffersQueued > 60 ) {
WaitForSingleObject( XAudio2_Notifier.hBufferEndEvent, INFINITE );
}
g_pSourceVoice->SubmitSourceBuffer( &g_SoundBuffer );
}
return 0;
}
I can't seem to figure out the problem, I have added error messages in init, opening video, codec handling etc. As mentioned before the xaudio2 code is working with an older ffmpeg, so maybe I have missed something with the avcodec_decode_audio4?
If this snappet of code isn't enough, I can post the whole code, these are just the places in the code I think the problem would be :(
I don't see you accessing decoded_frame anywhere after decoding. How do you expect to get the data out otherwise?
BYTE * snd = (BYTE *)malloc( data_size * sizeof(BYTE));
This also looks very fishy, given that data_size is derived from the packet size. The packet size is the size of the compressed data, it has very little to do with the size of the decoded PCM frame.
The decoded data is located in decoded_frame->extended_data, which is an array of pointers to data planes, see here for details. The size of the decoded data is determined by decoded_frame->nb_samples. Note that with recent Libav versions, many decoders return planar audio, so different channels live in different data buffers. For many use cases you need to convert that to interleaved format, where there's just one buffer with all the channels. Use libavresample for that.
Related
I'm making a discord bot with Discord.js v14 that records users' audio as individual files and one collective file. As Discord.js streams do not interpolate silence, my question is how to interpolate silence into streams.
My code is based off the Discord.js recording example.
In essence, a privileged user enters a voice channel (or stage), runs /record and all the users in that channel are recorded up until the point that they run /leave.
I've tried using Node packages like combined-stream, audio-mixer, multistream and multipipe, but I'm not familiar enough with Node streams to use the pros of each to fill in the gaps the cons add to the problem. I'm not entirely sure how to go about interpolating silence, either, whether it be through a Transform (likely requires the stream to be continuous, or for the receiver stream to be applied onto silence) or through a sort of "multi-stream" that swaps between piping the stream and a silence buffer. I also have yet to overlay the audio files (e.g, with ffmpeg).
Would it even be possible for a Readable to await an audio chunk and, if none is given within a certain timeframe, push a chunk of silence instead? My attempt at doing so is below (again, based off the Discord.js recorder example):
// CREDIT TO: https://stackoverflow.com/a/69328242/8387760
const SILENCE = Buffer.from([0xf8, 0xff, 0xfe]);
async function createListeningStream(connection, userId) {
// Creating manually terminated stream
let receiverStream = connection.receiver.subscribe(userId, {
end: {
behavior: EndBehaviorType.Manual
},
});
// Interpolating silence
// TODO Increases file length over tenfold by stretching audio?
let userStream = new Readable({
read() {
receiverStream.on('data', chunk => {
if (chunk) {
this.push(chunk);
}
else {
// Never occurs
this.push(SILENCE);
}
});
}
});
/* Piping userStream to file at 48kHz sample rate */
}
As an unnecessary bonus, it would help if it were possible to check whether a user ever spoke or not to eliminate creating empty recordings.
Thanks in advance.
Related:
Record all users in a voice channel in discord js v12
Adding silent frames to a node js stream when no data is received
After a lot of reading about Node streams, the solution I procured was unexpectedly simple.
Create a boolean variable recording that is true when the recording should continue and false when it should stop
Create a buffer to handle backpressuring (i.e, when data is input at a higher rate than its output)
let buffer = [];
Create a readable stream for which the receiving user audio stream is piped into
// New audio stream (with silence)
let userStream = new Readable({
// ...
});
// User audio stream (without silence)
let receiverStream = connection.receiver.subscribe(userId, {
end: {
behavior: EndBehaviorType.Manual,
},
});
receiverStream.on('data', chunk => buffer.push(chunk));
In that stream's read method, handle stream recording with a 48kHz timer to match the sample rate of the user audio stream
read() {
if (recording) {
let delay = new NanoTimer();
delay.setTimeout(() => {
if (buffer.length > 0) {
this.push(buffer.shift());
}
else {
this.push(SILENCE);
}
}, '', '20m');
}
// ...
}
In the same method, also handle ending the stream
// ...
else if (buffer.length > 0) {
// Stream is ending: sending buffered audio ASAP
this.push(buffer.shift());
}
else {
// Ending stream
this.push(null);
}
If we put it all together:
const NanoTimer = require('nanotimer'); // node
/* import NanoTimer from 'nanotimer'; */ // es6
const SILENCE = Buffer.from([0xf8, 0xff, 0xfe]);
async function createListeningStream(connection, userId) {
// Accumulates very, very slowly, but only when user is speaking: reduces buffer size otherwise
let buffer = [];
// Interpolating silence into user audio stream
let userStream = new Readable({
read() {
if (recording) {
// Pushing audio at the same rate of the receiver
// (Could probably be replaced with standard, less precise timer)
let delay = new NanoTimer();
delay.setTimeout(() => {
if (buffer.length > 0) {
this.push(buffer.shift());
}
else {
this.push(SILENCE);
}
// delay.clearTimeout();
}, '', '20m'); // A 20.833ms period makes for a 48kHz frequency
}
else if (buffer.length > 0) {
// Sending buffered audio ASAP
this.push(buffer.shift());
}
else {
// Ending stream
this.push(null);
}
}
});
// Redirecting user audio to userStream to have silence interpolated
let receiverStream = connection.receiver.subscribe(userId, {
end: {
behavior: EndBehaviorType.Manual, // Manually closed elsewhere
},
// mode: 'pcm',
});
receiverStream.on('data', chunk => buffer.push(chunk));
// pipeline(userStream, ...), etc.
}
From here, you can pipe that stream into a fileWriteStream, etc. for individual purposes. Note that it's a good idea to also close the receiverStream whenever recording = false with something like:
connection.receiver.subscriptions.delete(userId);
As well, the userStream should, too be closed if it's not, e.g, the first argument of the pipeline method.
As a side note, although outside the scope of my original question, there are many other modifications you can make to this. For instance, you can prepend silence to the audio before piping the receiverStream's data to the userStream, e.g, to make multiple audio streams of the same length:
// let startTime = ...
let creationTime;
for (let i = startTime; i < (creationTime = Date.now()); i++) {
buffer.push(SILENCE);
}
Happy coding!
I have a simple C program that plays audio using the ALSA APIs and I wish to know the precise timing of the audio buffers.
I am attempting to retrieve the timestamps from the audio driver using ALSA's snd_pcm_htimestamp functionality, which returns two values - a timestamp and a frame count.
However, the timestamp returned from ALSA is unset (zero values). The second returned variable, the "number of available frames when timestamp was grabbed", looks to be set correctly. Does anyone have an idea as to why the timestamps are seemingly unset?
I am configuring timestamps to be activated in my setup like so:
err = snd_pcm_sw_params_set_tstamp_mode(pcmHandle, swparams, SND_PCM_TSTAMP_ENABLE);
if (err < 0) {
printf("Unable to set timestamp mode: %s\n", snd_strerror(err));
return err;
}
And I verify that it has been set:
snd_pcm_tstamp_t timestampMode;
err = snd_pcm_sw_params_get_tstamp_mode(swparams, ×tampMode);
if (timestampMode != SND_PCM_TSTAMP_ENABLE)
{
// error ...
}
Then in the program's main while loop, after I feed ALSA with samples using snd_pcm_writei, I attempt to obtain that buffer's timestamp like so:
snd_pcm_writei(pcmHandle, samples, frameCount);
snd_htimestamp_t ts;
snd_pcm_uframes_t avail;
err = snd_pcm_htimestamp(pcmHandle, &avail, &ts);
if (err < 0)
{
printf("Unable to get timestamp: %s\n", snd_strerror(err));
return err;
}
printf("avail: %d\n", avail);
printf("%lld.%.9ld", (long long)ts.tv_sec, ts.tv_nsec);
However, whilst avail seems to be set, ts is always 0.000000000.
I am on a Raspberry Pi running Raspbian with an ADA1475 audio interface.
Thanks in advance,
Andy
The change to swparams must be applied to the PCM interface with snd_pcm_sw_params().
/* Allocate a temporary swparams struct */
snd_pcm_sw_params_t *swparams;
snd_pcm_sw_params_alloca(&swparams);
/* Retrieve current SW parameters. */
snd_pcm_sw_params_current(pcmHandle, swparams);
/* Change software parameters. */
snd_pcm_sw_params_get_tstamp_mode(swparams, SND_PCM_TSTAMP_ENABLE);
snd_pcm_sw_params_set_tstamp_type(pcmHandle, swparams, SND_PCM_TSTAMP_TYPE_GETTIMEOFDAY);
/* Apply updated software parameters to PCM interface. */
snd_pcm_sw_params(pcmHandle, swparams); // <-- Change takes effect here.
ALSA allows software parameters to be changed at any time, even while the stream is running.
The actual gst version is 1.8.1.
Currently I have code that receives a gstreamer encoded stream and plays it through my soundcard. I want to modify it to instead give my application access to the raw un-compressed audio data. This should result in an array of integer sound samples, and if I were to plot them I would see the audio wave form (e.g. a perfect tone would be a nice sine wave), and if I were to append the most recent array to the last one received by a callback I wouldn't see any discontinuity.
This is the current playback code:
https://github.com/lucasw/audio_common/blob/master/audio_play/src/audio_play.cpp
I think I need to change the alsasink to an appsink, and setting up a callback that will get the latest chunk of audio after it has passed through the decoder. This is adapted from https://github.com/jojva/gst-plugins-base/blob/master/tests/examples/app/appsink-src.c :
_sink = gst_element_factory_make("appsink", "sink");
g_object_set (G_OBJECT (_sink), "emit-signals", TRUE,
"sync", FALSE, NULL);
g_signal_connect (_sink, "new-sample",
G_CALLBACK (on_new_sample_from_sink), this);
And then there is the callback:
static GstFlowReturn
on_new_sample_from_sink (GstElement * elt, gpointer data)
{
RosGstProcess *client = reinterpret_cast<RosGstProcess*>(data);
GstSample *sample;
GstBuffer *app_buffer, *buffer;
GstElement *source;
/* get the sample from appsink */
sample = gst_app_sink_pull_sample (GST_APP_SINK (elt));
buffer = gst_sample_get_buffer (sample);
/* make a copy */
app_buffer = gst_buffer_copy (buffer);
/* we don't need the appsink sample anymore */
gst_sample_unref (sample);
/* get source and push new buffer */
source = gst_bin_get_by_name (GST_BIN (client->_sink), "app_source");
return gst_app_src_push_buffer (GST_APP_SRC (source), app_buffer);
}
Can I get at the data in that callback? What am I supposed to do with the GstFlowReturn? If that is passing data to another pipeline element I don't want to do that, I'd rather get it there and be done.
https://github.com/lucasw/audio_common/blob/appsink/audio_process/src/audio_process.cpp
Is the gpointer data passed to that callback exactly what I want (cast to a gint16 array?), or otherwise how do I convert and access it?
The GstFlowReturn is merely a return value for the underlying base classes. If you would return an error there the pipeline probably stops because.. well there was a critical error.
The cb_need_data events are triggered by your appsrc element. This can be used as a throttling mechanism if needed. Since you probably use the appsrc in a pure push mode (as soon something arrives at the appsink you push it to the appsrc) you can ignore these. You also explicitly disable these events on the appsrc element. (Or do you still use the one?)
The data format in the buffer depends on the caps that the decoder and appsink agreed on. That is usually the decoder preferred format. You may have some control over this format depending on the decoder or convert it to your preferred format. May be worthwhile to check the format, Float32 is not that uncommon..
I kind of forgot what your actual question was, I'm afraid..
I can interpret the data out of the modified callback below (there is a script that plots it to the screen), it looks like it is signed 16-bit samples in the uint8 array.
I'm not clear about the proper return value for the callback, there is a cb_need_data callback setup elsewhere in the code that is getting triggered all the time with this code.
static void // GstFlowReturn
on_new_sample_from_sink (GstElement * elt, gpointer data)
{
RosGstProcess *client = reinterpret_cast<RosGstProcess*>(data);
GstSample *sample;
GstBuffer *buffer;
GstElement *source;
/* get the sample from appsink */
sample = gst_app_sink_pull_sample (GST_APP_SINK (elt));
buffer = gst_sample_get_buffer (sample);
GstMapInfo map;
if (gst_buffer_map (buffer, &map, GST_MAP_READ))
{
audio_common_msgs::AudioData msg;
msg.data.resize(map.size);
// TODO(lucasw) copy this more efficiently
for (size_t i = 0; i < map.size; ++i)
{
msg.data[i] = map.data[i];
}
gst_buffer_unmap (buffer, &map);
client->_pub.publish(msg);
}
}
https://github.com/lucasw/audio_common/tree/appsink
I use av_read_frame in two ways below:
1.
for(;;){
if (av_read_frame(pFormatCtx, packet) >= 0){
av_packet_unref(packet);
}
}
2.
for(;;){
packet = av_packet_alloc();
if (av_read_frame(pFormatCtx, packet) >= 0){
av_packet_free(&packet);
}
}
Noting to do but read file and call free methods with FFMPEG APIs.
Both have a memery leak problem.(1.72GB film file got 300MB memory increment)
Anyone got rosulation?
I have been trying to figure out how to "speak" a text into a memory buffer using Windows SAPI 5.1 but so far no success, even though it seems it should be quite simple.
There is an example of streaming the synthesized speech into a .wav file, but no examples of how to stream it to a memory buffer.
In the end I need to have the synthesized speech in a char* array in 16 kHz 16-bit little-endian PCM format. Currently I create a temp .wav file, redirect speech output there, then read it, but it seems to be a rather stupid solution.
Anyone knows how to do that?
Thanks!
Look at ISpStream::SetBaseStream. Here's a little helper:
inline HRESULT SPCreateStreamOnHGlobal(
HGLOBAL hGlobal, //Memory handle for the stream object
BOOL fDeleteOnRelease, //Whether to free memory when the object is released
const WAVEFORMATEX * pwfex, //WaveFormatEx for stream
ISpStream ** ppStream) //Address of variable to receive ISpStream pointer
{
HRESULT hr;
IStream * pMemStream;
*ppStream = NULL;
hr = ::CreateStreamOnHGlobal(hGlobal, fDeleteOnRelease, &pMemStream);
if (SUCCEEDED(hr))
{
hr = ::CoCreateInstance(CLSID_SpStream, NULL, CLSCTX_ALL, __uuidof(*ppStream), (void **)ppStream);
if (SUCCEEDED(hr))
{
hr = (*ppStream)->SetBaseStream(pMemStream, SPDFID_WaveFormatEx, pwfex);
if (FAILED(hr))
{
(*ppStream)->Release();
*ppStream = NULL;
}
}
pMemStream->Release();
}
return hr;
}
I accomplished it using the ISpStream. Use Setbasestream function of the ispstream to bind it to an istream and then set the output of ispvoice to that ispstream.
Here is my working solution if anybody wants it :
https://github.com/itsyash/MS-SAPI-demo
Do you know how to create a memory-mapped file? You could see if the ISpStream will bind to it.