Does ADPCM has some sample rate? - audio

ADPCM is adaptive, so it has varible sample rate. But does it have some average rate or something? Does it have frames of fixed time duration?

You misunderstood it here :-). "Adaptive" doesn't mean that sample rate is adjusted according to the signal it contains.
"Adaptive" means that the limited available delta steps (4Bit = only 16 possibilities to encode a sample) are adapted to the signal by prediction. It attempts to approximate from a given sample which value the next sample may have and adapts the delta steps to that.
If the signal has less change from sample to sample, the steps are chosen closer togheter than if the signal has much change. It is very unlikely that the signal goes from very oscillating to quiet from one sample to the next.
You notice that behavior if you encode a square wave with 100Hz using such algorithm and re-open it in an audio editor that makes the waveform visible. When the waveform changes from one polarity to other, the signal "speeds up" (the steps are more and more apart) until it reaches the other end and then it slows down again (The steps are more and more close togheter).
It still has a fixed sample rate. The one you will give to it. In RIFF WAVE, the sample rate is stored in the header.

Related

ESP8266 analogRead() microphone Input into playable audio

My goal is to record audio using an electret microphone hooked into the analog pin of an esp8266 (12E) and then be able to play this audio on another device. My circuit is:
In order to check the output of the microphone I connected the circuit to the oscilloscope and got this:
In the "gif" above you can see the waves made by my voice when talking to microphone.
here is my code on esp8266:
void loop() {
sensorValue = analogRead(sensorPin);
Serial.print(sensorValue);
Serial.print(" ");
}
I would like to play the audio on the "Audacity" software in order to have an understanding of the result. Therefore, I copied the numbers from the serial monitor and paste it into the python code that maps the data to (-1,1) interval:
def mapPoint(value, currentMin, currentMax, targetMin, targetMax):
currentInterval = currentMax - currentMin
targetInterval = targetMax - targetMin
valueScaled = float(value - currentMin) / float(currentInterval)
return round(targetMin + (valueScaled * targetInterval),5)
class mapper():
def __init__(self,raws):
self.raws=raws.split(" ")
self.raws=[float(i) for i in self.raws]
def mapAll(self):
self.mappeds=[mapPoint(i,min(self.raws),max(self.raws),-1,1) for i in self.raws ]
self.strmappeds=str(self.mappeds).replace(",","").replace("]","").replace("[","")
return self.strmappeds
Which takes the string of numbers, map them on the target interval (-1 ,+1) and return a space (" ") separated string of data ready to import into Audacity software. (Tools>Sample Data Import and then select the text file including the data). The result of importing data from almost 5 seconds voice:
which is about half a second and when I play I hear unintelligible noise. I also tried lower frequencies but there was only noise there, too.
The suspected causes for the problem are:
1- Esp8266 has not the capability to read the analog pin fast enough to return meaningful data (which is probably not the case since it's clock speed is around 100MHz).
2- The way software is gathering the data and outputs it is not the most optimized way (In the loop, Serial.print, etc.)
3- The microphone circuit output is too noisy. (which might be, but as observed from the oscilloscope test, my voice has to make a difference in the output audio. Which was not audible from the audacity)
4- The way I mapped and prepared the data for the Audacity.
Is there something else I could try?
Are there similar projects out there? (which to my surprise I couldn't find anything which was done transparently!)
What can be the right way to do this? (since it can be a very useful and economic method for recording, transmitting and analyzing audio.)
There are many issues with your project:
You do not set a bias voltage on A0. The ADC can only measure voltages between Ground and VCC. When removing the microphone from the circuit, the voltage at A0 should be close to VCC/2. This is usually achieved by adding a voltage divider between VCC and GND made of 2 resistors, and connected directly to A0. Between the cap and A0.
Also, your circuit looks weird... Is the 47uF cap connected directly to the 3.3V ? If that's the case, you should connect it to pin 2 of the microphone instead. This would also indicate that right now your ADC is only recording noise (no bias voltage will do that).
You do not pace you input, meaning that you do not have a constant sampling rate. That is a very important issue. I suggest you set yourself a realistic target that is well within the limits of the ADC, and the limits of your serial port. The transfer rate in bytes/sec of a serial port is usually equal to baud-rate / 8. For 9600 bauds, that's only about 1200 bytes/sec, which means that once converted to text, you max transfer rate drops to about 400 samples per second. This issue needs to be addressed and the max calculated before you begin, as the max attainable overall sample rate is the maximum of the sample rate from the ADC and the transfer rate of the serial port.
The way to grab samples depends a lot on your needs and what you are trying to do with this project, your audio bandwidth, resolution and audio quality requirements for the application and the amount of work you can put into it. Reading from a loop as you are doing now may work with a fast enough serial port, but the quality will always be poor.
The way that is usually done is with a timer interrupt starting the ADC measurement and an ADC interrupt grabbing the result and storing it in a small FIFO, while the main loop transfers from this ADC fifo to the serial port, along the other tasks assigned to the chip. This cannot be done directly with the Arduino libraries, as you need to control the ADC directly to do that.
Here a short checklist of things to do:
Get the full ESP8266 datasheet from Expressif. Look up the actual specs of the ADC, mainly: the sample rates and resolutions available with your oscillator, and also its electrical constraints, at least its input voltage range and input impedance.
Once you know these numbers, set yourself some target, the math needed for successful project need input numbers. What is your application? Do you want to record audio or just detect a nondescript noise? What are the minimum requirements needed for things to work?
Look up in the Arduino documentartion how to set up a timer interrupt and an ADC interrupt.
Look up in the datasheet which registers you'll need to access to configure and run the ADC.
Fix the voltage bias issue on the ADC input. Nothing can work before that's done, and you do not want to destroy your processor.
Make sure the input AC voltage (the 'swing' voltage) is large enough to give you the results you want. It is not unusual to have to amplify a mic signal (with an opamp or a transistor), just for impedance matching.
Then you can start writing code.
This may sound awfully complex for such a small task, but that's what the average day of an embedded programmer looks like.
[EDIT] Your circuit would work a lot better if you simply replaced the 47uF DC blocking capacitor by a series resistor. Its value should be in the 2.2k to 7.6k range, to keep the circuit impedance within the 10k Ohms or so needed for the ADC. This would insure that the input voltage to A0 is within the operating limits of the ADC (GND-3.3V on the NodeMCU board, 0-1V with bare chip).
The signal may still be too weak for your application, though. What is the amplitude of the signal on your scope? How many bits of resolution does that range cover once converted by the ADC? Example, for a .1V peak to peak signal (SIG = 0.1), an ADC range of 0-3.3V (RNG = 3.3) and 10 bits of resolution (RES = 1024), you'll have
binary-range = RES * (SIG / RNG)
= 1024 * (0.1 / 3.3)
= 1024 * .03
= 31.03
A range of 31, which means around Log2(31) (~= 5) useful bits of resolution, is that enough for your application ?
As an aside note: The ADC will give you positive values, with a DC offset, You will probably need to filter the digital output with a DC blocking filter before playback. https://manual.audacityteam.org/man/dc_offset.html

How to decode amplitude modulation when signal crosses zero?

I'm trying to decode the image signal from a Mitsubishi VisiTel telephone image sender in a C++ program. It is encoded as an analog audio signal modulated with a sine wave carrier of ~1764Hz.
I'm reading the audio from the sound card input as signed 8-bits at 44.1kHz, which gives a period of about 25 samples for the carrier. Obviously, the analog signal is not going to fall nicely on sample boundaries, so assume that this could shift by +/-1 sample.
My first attempts to decode the signal were by taking the peaks of the signal and assigning those as pixel values. That almost worked, but there seemed to be some "off-phase" pixels and the image would eventually skew.
Eventually, I got a signal by decoupling the pixel clock from the peaks and tying it to the samples. I also had to time each scan line separately, as it didn't end on a pixel multiple somehow.
But this signal wasn't quite correct, dark areas were coming out inverted somehow.
Image with dark areas inverted
Eventually I realized that there was a phase discontinuity at the light/dark transition. This indicated to me that the modulation signal was going over the zero point, causing the phase discontinuity in the resulting signal as it drives the carrier negative, reversing the peak/ trough relationship.
Discontinuity in AM signal
While I could try to modify my state machine to detect this type of transition, it seems like it would be kinda messy and prone to error.
I keep thinking that there has to be a proper math-y way to demodulate an AM signal where the modulator crosses the zero point. But all of the examples I am finding seem to just be simple peak based envelope detectors. The product detector explanations I've found seem to count on you having your carrier and phase exactly correct, and I'm not sure that still buys me anything for zero crossing signals.
What is the correct party-approved way to demodulate AM signals where the modulator crosses zero?
A complex (quadrature or IQ) product detector is the way to go. Even if your demodulation carrier is just close and not exact; a small frequency error just means the the result will have a DC offset, which can be removed at a later stage of processing.
You're going to need to determine the phase of the carrier, and then you can use a product detector. A quadrature detector would let you determine the phase after the fact, but since you have to do it anyway, you might as well do it first.
It is very likely that the VisiTel transmits a sync signal of some sort before the image that would have been used to determine the carrier phase and to indicate the start of picture transmission to the receiver. You should probably use that for its intended purpose.

gnuradio phase drift of AM demodulation

I am beginning a project using GNUradio and an inexpensive SDR.
http://www.amazon.com/gp/product/B00SXZDUAQ?psc=1&redirect=true&ref_=oh_aui_search_detailpage
One portion of the project requires me to generate a reference audio tone and compare the phase of that tone to demodulated audio.
To simulate this portion of the system, I have generated a simple GNUradio flowchart:
I had some issues with the source and demodulated audio in that they would drift relative to each other. This occurred on the scope sync on the original flowgraph. To aid in troubleshooting I sent the demodulated audio out thru the soundcard’s second channel and monitored both audio streams in addition to the modulated RF on an external oscilloscope:
Initially all seems well but, the demodulated audio drifts in relation to the original source and RF:
My question is: am I doing something wrong in the flowgraph or am I expecting too much performance out of an inexpensive SDR?
Thanks in advance for any insights
You cannot expect to see zero phase drift in anything short of a fully digital simulation, or a fully analog circuit with exactly one oscillator, because no two (physical) oscillators have identical frequencies.
In your case, there are two relevant oscillators involved:
The sample clock in the RTL-SDR unit.
The sample clock in your sound card output.
Within an GNU Radio flowgraph, there is no time reference per se and everything depends on the sources and sinks which are connected to hardware.
The relevant source in your flowgraph is the RTL-SDR hardware; insofar as its oscillator is different from its nominal value (28.8 MHz, as it happens), everything it produces will be off-frequency in an absolute sense (both RF carrier frequencies and audio frequencies of demodulated output).
But you don't actually have an absolute frequency reference; you have the tone produced by your sound card. The sound card has its own oscillator, which determines the rate at which samples are converted to analog signals, and therefore the rate at which samples are consumed from the flowgraph.
Therefore, your reference signal will drift relative to your received and demodulated signal, at a rate determined by the difference in frequency error between the two oscillators.
Additionally, since your sound card will be accepting samples from the flowgraph at a slightly different real-time rate than the RTL-SDR is producing them, you will notice periodic glitches in the audio as the error accumulates and must be dealt with; they will start occurring either immediately (if the source is slower than the sink, requiring the sound card to play silence instead) or after a delay for buffers to hit their maximum size (if the source is faster than the sink, requiring the RTL-SDR to drop some samples).

Creating audio level meter - signal normalization

I have program which tracks audio signal in real time. Every processed sample I am able to read value of it in range between <-1, 1>.
I would like to create(and later display) audio level meter. From what I understand - to do it I need to keep converting my audio signal in real time, on each channel to dB and then display dB values on each channel in some graphical form of bars.
I am a bit lost how to do it and it should be simple matter. Would just normalization from <-1, 1> to <0, 1> (like... [n-sample +1]/2) and then calculating 20*log10 from each upcoming sample make it?
You can't plot the signal directly, as it always varying positive and negative.
Therefore you need to average out the strength of the signal every so many samples.
Say you're sampling at 44.1kHz, perhaps you might choose 4410 samples so you're updating your display 10 times per second.
So you calculate the RMS of your 4410 samples - see http://en.wikipedia.org/wiki/Root_mean_square
The RMS value is always positive.
You can then convert this to Db:
dBV = 20 x log10(Vrms)
This assumes that your maximum signal -1 to +1 corresponds to -1 to +1 volt. You will need to do further adjustments if not.

Digital delay decay

I am developing a digital delay on a microcontroller and I am stuck with the delay decay. The delay is implemented with a comb filter.
Here it is: http://www.tonmeister.ca/main/textbook/intro_to_sound_recording837x.png
The delay line, "emulating the tape", is implemented as a circula buffer. The effect can be killed and such case does not represents an issue; when turning the effect off though, I have the tail of the delay left in the buffer to process, as if the delay had been frozen and the tail slowly decay (depending on the feedback gain).
My question is: how many times I have to recirculate samples through the buffer?
One way I thought to approach this could be by modelling the physical process ... assuming that the input sequence has a loudness of 0dB for its entire duration and that, after going through the delay line, it gets attenuated by a factor of 1/10. In terms of loudness this corresponds to a drop of 20dB, as power = voltage^2, every time the sequence goes through the feedback path. The weakest audible sound has a loudness of −130dB but, taking into consideration the ambient noise as well, −120dB will be sufficient as the least reference power. Hence, after the echoes have been through the feedback path 6 times (120dB/20dB) they will be no longer audible.
Is there a more efficient way?
Thank you!

Resources