How do I use FFMPEG/libav to access the data in individual audio samples? - audio

The end result is I'm trying to visualise the audio waveform to use in a DAW-like software. So I want to get each sample's value and draw it. With that in mind, I'm currently stumped by trying to gain access to the values stored in each sample. For the time being, I'm just trying to access the value in the first sample - I'll build it into a loop once I have some working code.
I started off by following the code in this example. However, LibAV/FFMPEG has been updated since then, so a lot of the code is deprecated or straight up doesn't work the same anymore.
Based on the example above, I believe the logic is as follows:
get the formatting info of the audio file
get audio stream info from the format
check that the codec required for the stream is an audio codec
get the codec context (I think this is info about the codec) - This is where it gets kinda confusing for me
create an empty packet and frame to use - packets are for holding compressed data and frames are for holding uncompressed data
the format reads the first frame from the audio file into our packet
pass that packet into the codec context to be decoded
pass our frame to the codec context to receive the uncompressed audio data of the first frame
create a buffer to hold the values and try allocating samples to it from our frame
From debugging my code, I can see that step 7 succeeds and the packet that was empty receives some data. In step 8, the frame doesn't receive any data. This is what I need help with. I get that if I get the frame, assuming a stereo audio file, I should have two samples per frame, so really I just need your help to get uncompressed data into the frame.
I've scoured through the documentation for loads of different classes and I'm pretty sure I'm using the right classes and functions to achieve my goal, but evidently not (I'm also using Qt, so I'm using qDebug throughout, and QString to hold the URL for the audio file as path). So without further ado, here's my code:
// Step 1 - get the formatting info of the audio file
AVFormatContext* format = avformat_alloc_context();
if (avformat_open_input(&format, path.toStdString().c_str(), NULL, NULL) != 0) {
qDebug() << "Could not open file " << path;
return -1;
}
// Step 2 - get audio stream info from the format
if (avformat_find_stream_info(format, NULL) < 0) {
qDebug() << "Could not retrieve stream info from file " << path;
return -1;
}
// Step 3 - check that the codec required for the stream is an audio codec
int stream_index =- 1;
for (unsigned int i=0; i<format->nb_streams; i++) {
if (format->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
stream_index = i;
break;
}
}
if (stream_index == -1) {
qDebug() << "Could not retrieve audio stream from file " << path;
return -1;
}
// Step 4 -get the codec context
const AVCodec *codec = avcodec_find_decoder(format->streams[stream_index]->codecpar->codec_id);
AVCodecContext *codecContext = avcodec_alloc_context3(codec);
avcodec_open2(codecContext, codec, NULL);
// Step 5 - create an empty packet and frame to use
AVPacket *packet = av_packet_alloc();
AVFrame *frame = av_frame_alloc();
// Step 6 - the format reads the first frame from the audio file into our packet
av_read_frame(format, packet);
// Step 7 - pass that packet into the codec context to be decoded
avcodec_send_packet(codecContext, packet);
//Step 8 - pass our frame to the codec context to receive the uncompressed audio data of the first frame
avcodec_receive_frame(codecContext, frame);
// Step 9 - create a buffer to hold the values and try allocating samples to it from our frame
double *buffer;
av_samples_alloc((uint8_t**) &buffer, NULL, 1, frame->nb_samples, AV_SAMPLE_FMT_DBL, 0);
qDebug () << "packet: " << &packet;
qDebug() << "frame: " << frame;
qDebug () << "buffer: " << buffer;
For the time being, step 9 is incomplete as you can probably tell. But for now, I need help with step 8. Am I missing a step, using the wrong function, wrong class? Cheers.

Related

sending audio via bluetooth a2dp source esp32

I am trying to send measured i2s analogue signal (e.g. from mic) to the sink device via Bluetooth instead of the default noise.
Currently I am trying to change the bt_app_a2d_data_cb()
static int32_t bt_app_a2d_data_cb(uint8_t *data, int32_t i2s_read_len)
{
if (i2s_read_len < 0 || data == NULL) {
return 0;
}
char* i2s_read_buff = (char*) calloc(i2s_read_len, sizeof(char));
bytes_read = 0;
i2s_adc_enable(I2S_NUM_0);
while(bytes_read == 0)
{
i2s_read(I2S_NUM_0, i2s_read_buff, i2s_read_len,&bytes_read, portMAX_DELAY);
}
i2s_adc_disable(I2S_NUM_0);
// taking care of the watchdog//
TIMERG0.wdt_wprotect=TIMG_WDT_WKEY_VALUE;
TIMERG0.wdt_feed=1;
TIMERG0.wdt_wprotect=0;
uint32_t j = 0;
uint16_t dac_value = 0;
// change 16bit input signal to 8bit
for (int i = 0; i < i2s_read_len; i += 2) {
dac_value = ((((uint16_t) (i2s_read_buff[i + 1] & 0xf) << 8) | ((i2s_read_buff[i + 0]))));
data[j] = (uint8_t) dac_value * 256 / 4096;
j++;
}
// testing for loop
//uint8_t da = 0;
//for (int i = 0; i < i2s_read_len; i++) {
// data[i] = (uint8_t) (i2s_read_buff[i] >> 8);// & 0xff;
// da++;
// if(da>254) da=0;
//}
free(i2s_read_buff);
i2s_read_buff = NULL;
return i2s_read_len;
}
I can hear the sawtooth sound from the sink device.
Any ideas what to do?
your data can be an array of some float digits representing analog signals or analog signal variations, for example, a 32khz sound signal contains 320000 float numbers to define captures sound for every second. if your data have been expected to transmit in offline mode you can prepare your outcoming data in the form of a buffer plus a terminator sign then send buffer by Bluetooth module of sender device which is connected to the proper microcontroller. for the receiving device, if you got terminator character like "\r" you can process incoming buffer e.g. for my case, I had to send a string array of numbers but I often received at most one or two unknown characters and to avoid it I reject it while fulfill receiving container.
how to trim unknown first characters of string in code vision
if you want it in online mode i.e. your data must be transmitted and played concurrently. you must consider delays and reasonable time to process for all microcontrollers and devices like Bluetooth, EEprom iCs and...
I'm also working on a project "a2dp source esp32".
I'm playing a wav-file from spiffs.
If the wav-file is 44100, 16-bit, stereo then you can directly write a stream of bytes from the file to the array data[ ].
When I tried to write less data than in the len-variable and return less (for example 88), I got an error, now I'm trying to figure out how to reduce this buffer because of big latency (len=512).
Also, the data in the array data[ ] is stored as stereo.
Example: read data from file to data[ ]-array:
size_t read;
read = fread((void*) data, 1, len, fwave);//fwave is a file
if(read<len){//If get EOF, go to begin of the file
fseek(fwave , 0x2C , SEEK_SET);//skip wav-header 44bytesт
read = fread((void*) (&(data[read])), 1, len-read, fwave);//read up
}
If file mono, I convert it to stereo like this (I read half and then double data):
int32_t lenHalf=len/2;
read = fread((void*) data, 1, lenHalf, fwave);
if(read<lenHalf){
fseek(fwave , 0x2C , SEEK_SET);//skip wav-header 44bytesт
read = fread((void*) (&(data[read])), 1, lenHalf-read, fwave);//read up
}
//copy to the second channel
uint16_t *data16=(uint16_t*)data;
for (int i = lenHalf/2-1; i >= 0; i--) {
data16[(i << 1)] = data16[i];
data16[(i << 1) + 1] = data16[i];
}
I think you have got sawtooth sound because:
your data is mono?
in your "return i2s_read_len;" i2s_read_len less than len
you // change 16bit input signal to 8bit, in the array data[ ] data as 16-bit: 2ByteLeft-2ByteRight-2ByteLeft-2ByteRight-...
I'm not sure, it's a guess.

android AudioTrack playback short array (16bit)

I have an application that playback audio. It takes encoded audio data over RTP and decode it to 16bit array. The decoded 16bit array is converted to 8 bit array (byte array) as this is required for some other functionality.
Even though audio playback is working it is breaking continuously and very hard to recognise audio output. If I listen carefully I can tell it is playing the correct audio.
I suspect this is due to the fact I convert 16 bit data stream into a byte array and use the write(byte[], int, int, AudioTrack.WRITE_NON_BLOCKING) of AudioTrack class for audio playback.
Therefore I converted the byte array back to a short array and used write(short[], int, int, AudioTrack.WRITE_NON_BLOCKING) method to see if it could resolve the problem.
However now there is no audio sound at all. In the debug output I can see the short array has data.
What could be the reason?
Here is the AUdioTrak initialization
sampleRate =AudioTrack.getNativeOutputSampleRate(AudioManager.STREAM_MUSIC);
minimumBufferSize = AudioTrack.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT);
audioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, sampleRate,
AudioFormat.CHANNEL_OUT_STEREO,
AudioFormat.ENCODING_PCM_16BIT,
minimumBufferSize,
AudioTrack.MODE_STREAM);
Here is the code converts short array to byte array
for (int i=0;i<internalBuffer.length;i++){
bufferIndex = i*2;
buffer[bufferIndex] = shortToByte(internalBuffer[i])[0];
buffer[bufferIndex+1] = shortToByte(internalBuffer[i])[1];
}
Here is the method that converts byte array to short array.
public short[] getShortAudioBuffer(byte[] b){
short audioBuffer[] = null;
int index = 0;
int audioSize = 0;
ByteBuffer byteBuffer = ByteBuffer.allocate(2);
if ((b ==null) && (b.length<2)){
return null;
}else{
audioSize = (b.length - (b.length%2));
audioBuffer = new short[audioSize/2];
}
if ((audioSize/2) < 2)
return null;
byteBuffer.order(ByteOrder.LITTLE_ENDIAN);
for(int i=0;i<audioSize/2;i++){
index = i*2;
byteBuffer.put(b[index]);
byteBuffer.put(b[index+1]);
audioBuffer[i] = byteBuffer.getShort(0);
byteBuffer.clear();
System.out.print(Integer.toHexString(audioBuffer[i]) + " ");
}
System.out.println();
return audioBuffer;
}
Audio is decoded using opus library and the configuration is as follows;
opus_decoder_ctl(dec,OPUS_SET_APPLICATION(OPUS_APPLICATION_AUDIO));
opus_decoder_ctl(dec,OPUS_SET_SIGNAL(OPUS_SIGNAL_MUSIC));
opus_decoder_ctl(dec,OPUS_SET_FORCE_CHANNELS(OPUS_AUTO));
opus_decoder_ctl(dec,OPUS_SET_MAX_BANDWIDTH(OPUS_BANDWIDTH_FULLBAND));
opus_decoder_ctl(dec,OPUS_SET_PACKET_LOSS_PERC(0));
opus_decoder_ctl(dec,OPUS_SET_COMPLEXITY(10)); // highest complexity
opus_decoder_ctl(dec,OPUS_SET_LSB_DEPTH(16)); // 16bit = two byte samples
opus_decoder_ctl(dec,OPUS_SET_DTX(0)); // default - not using discontinuous transmission
opus_decoder_ctl(dec,OPUS_SET_VBR(1)); // use variable bit rate
opus_decoder_ctl(dec,OPUS_SET_VBR_CONSTRAINT(0)); // unconstrained
opus_decoder_ctl(dec,OPUS_SET_INBAND_FEC(0)); // no forward error correction
Let's assume you have a short[] array which contains the 16-bit one channel data to be played.
Then each sample is a value between -32768 and 32767 which represents the signal amplitude at the exact moment. And 0 value represents a middle point (no signal). This array can be passed to the audio track with ENCODING_PCM_16BIT format encoding.
But things are going weird when playing ENCODING_PCM_8BIT is used (See AudioFormat)
In this case each sample encoded by one byte. But each byte is unsigned. That means, it's value is between 0 and 255, while 128 represents the middle point.
Java has no unsigned byte format. Byte format is signed. I.e. values -128...-1 will represent actual values of 128...255. So you have to be careful when converting to the byte array, otherwise it will be a noise with barely recognizable source sound.
short[] input16 = ... // the source 16-bit audio data;
byte[] output8 = new byte[input16.length];
for (int i = 0 ; i < input16.length ; i++) {
// To convert 16 bit signed sample to 8 bit unsigned
// We add 128 (for rounding), then shift it right 8 positions
// Then add 128 to be in range 0..255
int sample = ((input16[i] + 128) >> 8) + 128;
if (sample > 255) sample = 255; // strip out overload
output8[i] = (byte)(sample); // cast to signed byte type
}
To perform backward conversion all should be the same: each single sample to be converted to exactly one sample of the output signal
byte[] input8 = // source 8-bit unsigned audio data;
short[] output16 = new short[input8.length];
for (int i = 0 ; i < input8.length ; i++) {
// to convert signed byte back to unsigned value just use bitwise AND with 0xFF
// then we need subtract 128 offset
// Then, just scale up the value by 256 to fit 16-bit range
output16[i] = (short)(((input8[i] & 0xFF) - 128) * 256);
}
The issue of not being able to convert data from byte array to short array was resolved when used bitwise operators instead of using ByteArray. It could be due not setting the correct parameters in ByteArray or it is not suitable for such conversion.
Nevertheless implementing conversion using bitwise operators resolved the problem. Since the original question has been resolved by this approach, please consider this as the final answer.
I will raise a separate topic for playback issue.
Thank you for all your support.

Format settings for iOS multimixer au when using a bluetooth endpoint

Hi Core Audio/Au community,
I have hit a roadbloc during development. My current AUGraph is set up as 2 Mono streams->Mixer unit->remoteIO unit on an iOS platform. I am using the mixer to mix two mono stream into stereo interleaved. However, the need is that mono streams neednt be mixed at all times while being played out in stereo i.e the interleaved stereo output should be composed of: the 1st mono stream in the left ear and the 2nd mono stream in the right ear. I am able to accomplish this using the kAudioUnitProperty_MatrixLevels property on the Multichannel mixer.
//Left out //right out
matrixVolumes[0][0]=1; matrixVolumes[0][1]=0.001;
matrixVolumes[1][0]=0.001; matrixVolumes[1][1]=0.001;
result = AudioUnitSetProperty(mAumixer, kAudioUnitProperty_MatrixLevels, kAudioUnitScope_Input, 0,matrixVolumes , matrixPropSize);
if (result) {
printf("Error while setting kAudioUnitProperty_MatrixLevels from mixer on bus 0 %ld %08X %4.4s\n", result, (unsigned int)result, (char*)&result);
return -1;
}
printf("setting matrix levels kAudioUnitProperty_MatrixLevels on bus 1 \n");
//Left out //right out
matrixVolumes[0][0]=0.001; matrixVolumes[0][1]=1;
matrixVolumes[1][0]=0.001; matrixVolumes[1][1]=0.001;
result = AudioUnitSetProperty(mAumixer, kAudioUnitProperty_MatrixLevels, kAudioUnitScope_Input, 1,matrixVolumes , matrixPropSize);
if (result) {
printf("Error while setting kAudioUnitProperty_MatrixLevels from mixer on bus 1 %ld %08X %4.4s\n", result, (unsigned int)result, (char*)&result);
return -1;
}
As shown above I am using the volume controls to control the streams playing as unmixed stereo interleaved separately. This works fine when I am using a wired headset to play the audio; the 1st mono stream plays on the left ear and the second mono stream plays on the right ear. But when I am switching to a bluetooth headset the audio output is a mix of both the mono streams playing on both the left and right channel. So, the matrix levels do not seem to work there. The formats used for the i/p and o/p of the mixer are as follows:
printf("create Input ASBD\n");
// client format audio goes into the mixer
obj->mInputFormat.mFormatID = kAudioFormatLinearPCM;
int sampleSize = ((UInt32)sizeof(AudioSampleType));
obj->mInputFormat.mFormatFlags = kAudioFormatFlagsCanonical;
obj->mInputFormat.mBitsPerChannel = 8 * sampleSize;
obj->mInputFormat.mChannelsPerFrame = 1; //mono
obj->mInputFormat.mFramesPerPacket = 1;
obj->mInputFormat.mBytesPerPacket = obj->mInputFormat.mBytesPerFrame = sampleSize;
obj->mInputFormat.mFormatFlags |= kAudioFormatFlagIsNonInterleaved;
// obj->mInputFormat.mSampleRate = obj->mGlobalSampleRate;(// set later while initializing audioStreamer or from the client app)
printf("create output ASBD\n");
// output format for the mixer unit output bus
obj->mOutputFormat.mFormatID = kAudioFormatLinearPCM;
obj->mOutputFormat.mFormatFlags = kAudioFormatFlagsCanonical | (kAudioUnitSampleFractionBits << kLinearPCMFormatFlagsSampleFractionShift);
obj->mOutputFormat.mChannelsPerFrame = 2;//stereo
obj->mOutputFormat.mFramesPerPacket = 1;
obj->mOutputFormat.mBitsPerChannel = 8 * ((UInt32)sizeof(AudioUnitSampleType));
obj->mOutputFormat.mBytesPerPacket = obj->mOutputFormat.mBytesPerFrame = 2 * ((UInt32)sizeof(AudioUnitSampleType));
// obj->mOutputFormat.mSampleRate = obj->mGlobalSampleRate; (// set later while initializing)
N.B : I am setting the sample rates separately from the application.The i/p and o/p sample rate of the mixer is same as sample rate of the mono audio files.
Thanks in advance for taking a look at the issue...:)

AAC stream resampled incorrectly

I do have a very particular problem, I wish I could find the answer to.
I'm trying to read an AAC stream from an URL (online streaming radio e.g. live.noroc.tv:8000/radionoroc.aacp) with NAudio library and get IEEE 32 bit floating samples.
Besides that I would like to resample the stream to a particular sample rate and channel count (rate 5512, mono).
Below is the code which accomplishes that:
int tenSecondsOfDownloadedAudio = 5512 * 10;
float[] buffer = new float[tenSecondsOfDownloadedAudio];
using (var reader = new MediaFoundationReader(pathToUrl))
{
var ieeeFloatWaveFormat = WaveFormat.CreateIeeeFloatWaveFormat(5512, 1); // mono
using (var resampler = new MediaFoundationResampler(reader, ieeeFloatWaveFormat))
{
var waveToSampleProvider = new WaveToSampleProvider(resampler);
int readSamples = 0;
int tempBuffer = new float[5512]; // 1 second buffer
while(readSamples <= tenSecondsOfDownloadedAudio)
{
int read = waveToSampleProvider.Read(tempBuffer, 0, tempBuffer.Length);
if(read == 0)
{
Thread.Sleep(500); // allow streaming buffer to get loaded
continue;
}
Array.Copy(tempBuffer, 0, buffer, readSamples, tempBuffer.Length);
readSamples += read;
}
}
}
These particular samples are then written to a Wave audio file using the following simple method:
using (var writer = new WaveFileWriter("path-to-audio-file.wav", WaveFormat.CreateIeeeFloatWaveFormat(5512, 1)))
{
writer.WriteSamples(samples, 0, samples.Length);
}
What I've encountered is that NAudio does not read 10 seconds of audio (as it was requested) but only 5, though the buffer array gets fully loaded with samples (which at this rate and channel count should contain 10 seconds of audio samples).
Thus the final audio file plays the stream 2 times as slower as it should (5 second stream is played as 10).
Is this somewhat related to different bit depths (should I record at 64 bits per sample as opposite to 32).
I do my testing at Windows Server 2008 R2 x64, with MFT codecs installed.
Would really appreciate any suggestions.
The problem seems to be with MediaFoundationReader failing to handle HE-AACv2 in ADTS container with is a standard online radio stream format and most likely the one you are dealing with.
Adobe products have the same problem mistreating this format exactly the same way^ stretching the first half of the audio to the whole duration and : Corrupted AAC files recorded from online stream
Supposedly, it has something to do with HE-AACv2 stereo stream being actually a mono stream with additional info channel for Parametric Stereo.

LibAV - what approach to take for realtime audio and video capture?

I'm using libav to encode raw RGB24 frames to h264 and muxing it to flv. This works
all fine and I've streamed for more then 48 hours w/o any problems! My next step
is to add audio to the stream. I'll be capturing live audio and I want to encode it
in real time using speex, mp3 or nelly moser.
Background info
I'm new to digital audio and therefore I might be doing things wrong. But basically my application gets a "float" buffer with interleaved audio. This "audioIn" function gets called by the application framework I'm using. The buffer contains 256 samples per channel,
and I have 2 channels. Because I might be mixing terminology, this is how I use the
data:
// input = array with audio samples
// bufferSize = 256
// nChannels = 2
void audioIn(float * input, int bufferSize, int nChannels) {
// convert from float to S16
short* buf = new signed short[bufferSize * 2];
for(int i = 0; i < bufferSize; ++i) { // loop over all samples
int dx = i * 2;
buf[dx + 0] = (float)input[dx + 0] * numeric_limits<short>::max(); // convert frame of the first channel
buf[dx + 1] = (float)input[dx + 1] * numeric_limits<short>::max(); // convert frame of the second channel
}
// add this to the libav wrapper.
av.addAudioFrame((unsigned char*)buf, bufferSize, nChannels);
delete[] buf;
}
Now that I have a buffer, where each sample is 16 bits, I pass this short* buffer, to my
wrapper av.addAudioFrame() function. In this function I create a buffer, before I encode
the audio. From what I read, the AVCodecContext of the audio encoder sets the frame_size. This frame_size must match the number of samples in the buffer when calling avcodec_encode_audio2(). Why I think this, is because of what is documented here.
Then, especially the line:
If it is not set, frame->nb_samples must be equal to avctx->frame_size for all frames except the last.*(Please correct me here if I'm wrong about this).
After encoding I call av_interleaved_write_frame() to actually write the frame.
When I use mp3 as codec my application runs for about 1-2 minutes and then my server, which is receiving the video/audio stream (flv, tcp), disconnects with a message "Frame too large: 14485504". This message is generated because the rtmp-server is getting a frame which is way to big. And this is probably due to the fact I'm not interleaving correctly with libav.
Questions:
There quite some bits I'm not sure of, even when going through the source code of libav and therefore I hope if someone has an working example of encoding audio which comes from a buffer which which comes from "outside" libav (i.e. your own application). i.e. how do you create a buffer which is large enough for the encoder? How do you make the "realtime" streaming work when you need to wait on this buffer to fill up?
As I wrote above I need to keep track of a buffer before I can encode. Does someone else has some code which does this? I'm using AVAudioFifo now. The functions which encodes the audio and fills/read the buffer is here too: https://gist.github.com/62f717bbaa69ac7196be
I compiled with --enable-debug=3 and disable optimizations, but I'm not seeing any
debug information. How can I make libav more verbose?
Thanks!

Resources