Weird Audio Blips in Recorded USB Audio Signal - audio

Might be the wrong place to ask this but I kinda need help figuring out what the real issue is...
Basically, I'm programming a microcontroller to do USB audio recording (using USB Audio Class 2.0 / high speed USB). Seems like I'm getting pretty close to getting it" aright" but I'm getting this when I record a chirp into audacity [below is an excerpt]:
I guess what I'm asking is why am I getting these weird interruptions and jumps in my recording session? Is it because I'm not reading my codec input buffers quickly enough or perhaps the frame length isn't being set correctly?
How I'm calculating my frame length which I got from a USB Audio guide from Apple (using 44.1kHz sample rate and 16 bit rate):
#define AUDIO_POLL_INT 4
#define FRAME_BYTES (BIT_RATE_16 / 8)
#define NUM_CHANNELS STEREO
uint16_t frame_len = 44 (44.1kHz/1000 samples) * NUM_CHANNELS * FRAME_BYTES;
if (!(frame_pos % 9)) frame_len += (1 * NUM_CHANNELS * FRAME_BYTES)
frame_len = (frame_len / 8) * (2 << (AUDIO_POLL_INT-1));
// 10 ms frames
frame_pos = (((frame_pos + 1) / 8) * (2 << (AUDIO_POLL_INT-1))) % 10;
This is also the process of reading codec input:
1) Read Codec Input; Load samples into temporary buffer (transfers from codec peripheral to a memory peripheral)
2) Memory Peripheral interrupt occurs when a transfer is complete (buffer is full; frame_len capacity has been fulfilled), send buffer samples to USB. Afterwards, read codec input again
Hopefully this isn't too confusing... let me know and I can add more info/clear up things. Thanks!

it doesn't look (to my eye) that you're dropping packets, just miss-ordering them.
Look at each of your 'blips'. It's pretty clear that each 'blip' actually is a stretch of audio that fits in the waveform about 1/3rd the waveform back in time.
Likely you've got a circular buffer and you're having issues with your read/write pointer or usage manager getting lost and sending out a buffer way late.
I don't think it's (directly related to, anyways) sample rate conversion or anything like that.

Related

ESP8266 analogRead() microphone Input into playable audio

My goal is to record audio using an electret microphone hooked into the analog pin of an esp8266 (12E) and then be able to play this audio on another device. My circuit is:
In order to check the output of the microphone I connected the circuit to the oscilloscope and got this:
In the "gif" above you can see the waves made by my voice when talking to microphone.
here is my code on esp8266:
void loop() {
sensorValue = analogRead(sensorPin);
Serial.print(sensorValue);
Serial.print(" ");
}
I would like to play the audio on the "Audacity" software in order to have an understanding of the result. Therefore, I copied the numbers from the serial monitor and paste it into the python code that maps the data to (-1,1) interval:
def mapPoint(value, currentMin, currentMax, targetMin, targetMax):
currentInterval = currentMax - currentMin
targetInterval = targetMax - targetMin
valueScaled = float(value - currentMin) / float(currentInterval)
return round(targetMin + (valueScaled * targetInterval),5)
class mapper():
def __init__(self,raws):
self.raws=raws.split(" ")
self.raws=[float(i) for i in self.raws]
def mapAll(self):
self.mappeds=[mapPoint(i,min(self.raws),max(self.raws),-1,1) for i in self.raws ]
self.strmappeds=str(self.mappeds).replace(",","").replace("]","").replace("[","")
return self.strmappeds
Which takes the string of numbers, map them on the target interval (-1 ,+1) and return a space (" ") separated string of data ready to import into Audacity software. (Tools>Sample Data Import and then select the text file including the data). The result of importing data from almost 5 seconds voice:
which is about half a second and when I play I hear unintelligible noise. I also tried lower frequencies but there was only noise there, too.
The suspected causes for the problem are:
1- Esp8266 has not the capability to read the analog pin fast enough to return meaningful data (which is probably not the case since it's clock speed is around 100MHz).
2- The way software is gathering the data and outputs it is not the most optimized way (In the loop, Serial.print, etc.)
3- The microphone circuit output is too noisy. (which might be, but as observed from the oscilloscope test, my voice has to make a difference in the output audio. Which was not audible from the audacity)
4- The way I mapped and prepared the data for the Audacity.
Is there something else I could try?
Are there similar projects out there? (which to my surprise I couldn't find anything which was done transparently!)
What can be the right way to do this? (since it can be a very useful and economic method for recording, transmitting and analyzing audio.)
There are many issues with your project:
You do not set a bias voltage on A0. The ADC can only measure voltages between Ground and VCC. When removing the microphone from the circuit, the voltage at A0 should be close to VCC/2. This is usually achieved by adding a voltage divider between VCC and GND made of 2 resistors, and connected directly to A0. Between the cap and A0.
Also, your circuit looks weird... Is the 47uF cap connected directly to the 3.3V ? If that's the case, you should connect it to pin 2 of the microphone instead. This would also indicate that right now your ADC is only recording noise (no bias voltage will do that).
You do not pace you input, meaning that you do not have a constant sampling rate. That is a very important issue. I suggest you set yourself a realistic target that is well within the limits of the ADC, and the limits of your serial port. The transfer rate in bytes/sec of a serial port is usually equal to baud-rate / 8. For 9600 bauds, that's only about 1200 bytes/sec, which means that once converted to text, you max transfer rate drops to about 400 samples per second. This issue needs to be addressed and the max calculated before you begin, as the max attainable overall sample rate is the maximum of the sample rate from the ADC and the transfer rate of the serial port.
The way to grab samples depends a lot on your needs and what you are trying to do with this project, your audio bandwidth, resolution and audio quality requirements for the application and the amount of work you can put into it. Reading from a loop as you are doing now may work with a fast enough serial port, but the quality will always be poor.
The way that is usually done is with a timer interrupt starting the ADC measurement and an ADC interrupt grabbing the result and storing it in a small FIFO, while the main loop transfers from this ADC fifo to the serial port, along the other tasks assigned to the chip. This cannot be done directly with the Arduino libraries, as you need to control the ADC directly to do that.
Here a short checklist of things to do:
Get the full ESP8266 datasheet from Expressif. Look up the actual specs of the ADC, mainly: the sample rates and resolutions available with your oscillator, and also its electrical constraints, at least its input voltage range and input impedance.
Once you know these numbers, set yourself some target, the math needed for successful project need input numbers. What is your application? Do you want to record audio or just detect a nondescript noise? What are the minimum requirements needed for things to work?
Look up in the Arduino documentartion how to set up a timer interrupt and an ADC interrupt.
Look up in the datasheet which registers you'll need to access to configure and run the ADC.
Fix the voltage bias issue on the ADC input. Nothing can work before that's done, and you do not want to destroy your processor.
Make sure the input AC voltage (the 'swing' voltage) is large enough to give you the results you want. It is not unusual to have to amplify a mic signal (with an opamp or a transistor), just for impedance matching.
Then you can start writing code.
This may sound awfully complex for such a small task, but that's what the average day of an embedded programmer looks like.
[EDIT] Your circuit would work a lot better if you simply replaced the 47uF DC blocking capacitor by a series resistor. Its value should be in the 2.2k to 7.6k range, to keep the circuit impedance within the 10k Ohms or so needed for the ADC. This would insure that the input voltage to A0 is within the operating limits of the ADC (GND-3.3V on the NodeMCU board, 0-1V with bare chip).
The signal may still be too weak for your application, though. What is the amplitude of the signal on your scope? How many bits of resolution does that range cover once converted by the ADC? Example, for a .1V peak to peak signal (SIG = 0.1), an ADC range of 0-3.3V (RNG = 3.3) and 10 bits of resolution (RES = 1024), you'll have
binary-range = RES * (SIG / RNG)
= 1024 * (0.1 / 3.3)
= 1024 * .03
= 31.03
A range of 31, which means around Log2(31) (~= 5) useful bits of resolution, is that enough for your application ?
As an aside note: The ADC will give you positive values, with a DC offset, You will probably need to filter the digital output with a DC blocking filter before playback. https://manual.audacityteam.org/man/dc_offset.html

AudioUnit (Mac) AudioUnitRender internal buffer clash

I recently designed a Sound recorder on a mac using AudioUnits. It was designed to behave like a video security system, recording continuously, with a graphics display of power levels for playback browsing.
I've noticed that every 85 minutes distortion appears for 3 minutes. After a day of elimination it appears that the sound acquisition that occurs before callback is called uses a circular buffer, and the callback's audioUnitRender function extracts from this buffer but with a slightly slower speed, which eventually causes the internal buffer write to wrap around and catch up with audioUnitRender reads. The duplex operation test shows the latency ever increasing, and after 85 minutes you hear about 200-300ms of latency and the noise begins as the render buffer frame has a combination of buffer segments at end and beginning of buffer, i.e long and short latencies. as the pointers drift apart the noise disappears and you hear clean audio with original short latency, then it repeats again 85 mins later. Even with low impact callback processing this still happens. I've seen some posts regarding latency but none regarding clashes, has anyone seen this?
osx 10.9.5, xcode 6.1.1
code details:-
//modes 1=playback, 2=record, 3=both
AudioComponentDescription outputcd = {0}; // 10.6 version
outputcd.componentType = kAudioUnitType_Output;
outputcd.componentSubType = kAudioUnitSubType_HALOutput; //allows duplex
outputcd.componentManufacturer = kAudioUnitManufacturer_Apple;
AudioComponent comp = AudioComponentFindNext (NULL, &outputcd);
if (comp == NULL) {printf ("can't get output unit");exit (-1);}
CheckError (AudioComponentInstanceNew(comp, au),"Couldn't open component for outputUnit");
//tell input bus that its's input, tell output it's an output
if(mode==1 || mode==3) r=[self setAudioMode:*au :0];//play
if(mode==2 || mode==3) r=[self setAudioMode:*au :1];//rec
// register render callback
if(mode==1 || mode==3) [self setCallBack:*au :0];
if(mode==2 || mode==3) [self setCallBack:*au :1];
// if(mode==2 || mode==3) [self setAllocBuffer:*au];
// get default stream, change amt of channels
AudioStreamBasicDescription audioFormat;
UInt32 k=sizeof(audioFormat);
r= AudioUnitGetProperty(*au,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Output,
1,
&audioFormat,
&k);
audioFormat.mChannelsPerFrame=1;
r= AudioUnitSetProperty(*au,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Output,
1,
&audioFormat,
k);
//start
CheckError (AudioUnitInitialize(outputUnit),"Couldn't initialize output unit");
//record callback
OSStatus RecProc(void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList * ioData)
{
myView * mv2=(__bridge myView*)inRefCon;
AudioBuffer buffer,buffer2;
OSStatus status;
buffer.mDataByteSize = inNumberFrames *4 ;// buffer size
buffer.mNumberChannels = 1; // one channel
buffer.mData =mv2->rdata;
buffer2.mDataByteSize = inNumberFrames *4 ;// buffer size
buffer2.mNumberChannels = 1; // one channel
buffer2.mData =mv2->rdata2;
AudioBufferList bufferList;
bufferList.mNumberBuffers = 2;
bufferList.mBuffers[0] = buffer;
bufferList.mBuffers[1] = buffer2;
status = AudioUnitRender(mv2->outputUnit, ioActionFlags, inTimeStamp, inBusNumber, inNumberFrames, &bufferList);
[mv2 recproc :mv->rdata :mv->rdata2 :inNumberFrames];
return noErr;
}
You seem to be using the HAL output unit for pulling input. There might not be a guarantee that the input device and output device sample rates are exactly locked. Any slow slight drift in the sample rate of either device could eventually cause a buffer underflow or overflow.
One solution might be to find and set an input device for a separate input audio unit instead of depending on the default output unit. Try a USB mic, for instance.
According to this article https://www.native-instruments.com/forum/threads/latency-drift-problem-on-macbook.175551/ this problem appears to be a usb audio driver bug in maverick. I didn't find a kext replacement solution anywhere.
After making a sonar type tester (1 cycle 22khz square wave click every 600 ms to speaker, display selected recorded frame number after click) and could see the 3 to 4 samples drift per second along with the distortion/latency drift reset experience after 1.5 hrs, I decided to look around and find how to access the buffer pointers to stabilise the latency drift, but also no luck.
Also api latency queries show no changes as it drifts.
I did find that you could reset the latency with audiounitstop then audiounitstart (same thread), but it worked only if only one audiounit bus system wide was active. Research also showed that the latency could be reset if you toggle the hardware device sample-rate in Audio Midi Setup. this is a bit aggressive and would be uncomfortable for some.
My design toggled the nominalsamplerate (AudioObjectSetPropertyData with kAudioDevicePropertyNominalSampleRate) every 60 minutes (48000 then back to 44100), with delay by way of waiting for change notification through a callback.
This cause a 2 second void in audio input and output every hour. Safari playing a youtube video would mute, and cause a 1-2 second video freeze during this time . VLC showed the same but video remained smooth during 2 second silence.
Like I said, it wouldn't work for all, but I chose system wide 2 second mute every hour over a recording that has 3 minutes of fuzzy audio every 1.5 hrs. Its been posted that a yosemite upgrade fixes this, although some have also found crackling after going up to yosemite.

FFMPEG Understanding AVFrame::linesize (Audio)

As per the doucmentation of AVFrame, for audio, lineSize is size in bytes of each plane and only linesize[0] may be set. But however, am unsure whether lineszie[0] is holding per plane buffer size or is it the complete buffer size and we have to divide it by no of channels to get per plane buffer size.
For Example, when I call
int data_size = av_samples_get_buffer_size(NULL, iDesiredNoOfChannels, iAudioSamples, (AVSampleFormat)iDesiredFormat, 0) ; For iDesiredNoOfChannels = 2, iAudioSamples = 1024 & iDesiredFormat = AV_SAMPLE_FMT_FLTP data_size=8192. Pretty straightforward, as each sample is 4 bytes and since there are 2 channels total memory will be (1024 * 4 * 2) bytes. As such lineSize[0] should be 4096 for planar audio. data[0] & data[1] should be each of size 4096. However, pFrame->lineSize[0] is giving 8192. So to get the size per plane, I have to do pFrame->lineSize[0] / pFrame->channels. Isn't this behaviour different from what the documentation suggests or is my understanding of the documentaion wrong.
Old question but thought I'd answer it anyway for people who may be wondering the same thing.
In all audio AVFrames, only linesize[0] may be set and they are all required to be the same size. You should not be using linesize[1], etc. I don't know why they chose to do things this way because it's not consistent with video frames, but whatever. Just remember whether interleaved or planar only linesize[0] matters so you have to divide by channel count for planar.

XNA Microphone audio buffer format?

I'm working on an XNA script in which I want to read data from the microphone every couple of frames and estimate its pitch. I took input based almost exactly on this page (http://msdn.microsoft.com/en-us/library/ff827802.aspx).
Now I've got a buffer full bytes. What does it represent? I reset everything and look at my buffer every 10th frame, so it appears to be a giant array that has 9 instances of 1764 bytes at different points in time (The whole thing is 15876 bytes large). I'm assuming it's the time domain of sound pressure, because I can't find any information on the format of microphone input. Anybody know how this works? I have a friend who has an FFT up and running, but we're trying to learn as much as we can about that data I'm collecting before we attempt to plug it in.
The samples are in Little-Endian 16 bit Linear PCM. Convert each pair of bytes into a signed short as
short sample = (short)(buffer[i] | buffer[i+1] << 8);

a-law/raw audio data

I have spent the evening messing around with raw A-law audio input/output from the built in ALSA tools aplay and arecord, and passing them through an offline moving average filter I have written.
My question is: the audio seems to be encoded using values between 0x2A and 0xAA - a range of 128. I have been reading through this guide which is informative but doesn't really explain why and offset of 42 (0x2A) has been chosen. The file I used to examine this was a square wave exported from audacity as unsigned 8-bit 8kHz audio and examined in a hex editor.
Can anyone shed some light on how A-law is encoded in a file?
This may help;
/dev/dsp
8000 frames per second, 8 bits per frame (1 byte);
# Max volume = \xff (or \x00).
# No volume = \x80 (the middle).

Resources