I2S and PCM format - audio

Can someone explain what is the difference between I2S interface and PCM interface. Does I2S interface only supports i2s format audio data and not PCM format audio data ?

PCM is a digital representation of an audio signal. It can be stored in memory or written on paper or whatever. An example of a 16-bit PCM audio sample might be something like 0x0152.
I2S is a electrical serial interface used to transmit PCM data from one device to another. The interface has a line used to delineate frames called the frame clock, a line for marking individual bits called the bit clock and 1 or more lines for the data. At the start of each frame clock a PCM sample is serialized bit by bit with a high voltage for a 1 and a zero voltage for a 0. The bit is held at that value for the entire duration of a bit clock and then it moves onto the next bit.
Here's some ascii art showing how an 8-bit sample 0x55 (01010101 binary), single channel might be transmitted. The frame clock runs at the sample rate, the bit clock at 8 times the sample rate and the data line contains the embedded data.
_______________ _
FCLK _| |_______________|
_ _ _ _ _ _ _ _ _
BCLK _| |_| |_| |_| |_| |_| |_| |_| |_|
___ ___ ___ ___
DATA ___0_| 1 |_0_| 1 | 0 | 1 |_0_| 1 |_
The wikipedia articles do a pretty good job of explaining.

From NXP documentation:
PCM
Most converters use a frame sync signal to signify the beginning of a
new sample of audio data. These converters are usually associated with
mono or single channel converters. The frame sync pulse frequency is
usually the sample rate in the single channel converter. There are a
few variations such as to whether the the most significant bit (MSB)
or least significant bit (LSB) comes first or if the data starts with
the frame sync or one bit time after. Other variations have to do with
frame sync and clock being active high or active low. The figures
below show some examples of audio data formats. The frame sync signal
determines when the next audio sample is to be transferred between the
controller and the converter. Also, the frame sync signal as seen in
the above figure can be one bit time or a long bit time. That is why
the frame sync frequency is usually the sample rate. There are some
variations to accommodate more audio channels from having every other
frame be a different channel to having the bit clock be fast enough to
have more than one channel data in each frame sync. For example,
having 32-bits transferred each frame sync when the data sample size
is 16-bits. These channel variations can be interfaced to the MPC5200
PSC, but usually stereo 2-channel converters use an I2S interface, as
described in the next section.
I2S
I2S was defined by Philips source for 2-channel stereo audio streams.
The left or right channel audio data is defined by the state of the
LRCK signal. The LRCK is the frame sync signal and defines the sample
frequency for the data. I2S can accommodate any data size usually from
8 to 32 bits for each channel with the most significant bit (MSB)
first. Notice the data is shifted by one bit from the start of the
LRCLK. Since the MSB comes first, the controller can output more or
less bits than the converter is expecting. For example, if the
converter is 32-bit, but the controller only has 16-bit samples, the
data can be left-justified to the MSB and have the lower 16-bits set
0. The converter can still accurately represent the signal in 32 bits. The same connection can be used for 8 or 32 bit data samples without
changing anything except the number of bits used in the audio sample.
A variation on I2S which is called left-justified swaps the state
meaning of the frame sync signal from low meaning left to high meaning
left, and it removes the single clock delay for the first bit in
relation to the frame sync signal. The MPC5200 PSC can easily work
with either format.
https://www.nxp.com/docs/en/application-note/AN2979.pdf

Related

play audio with pwm of a attiny85

I'm trying to understand how to implement audio playback from scratch on attiny85. The goal is to play a short sound (cat meows, so i want it to remain recognizable) from an array representing strength of audio signal sampled at fixed interval.
As far as i understand, signal strength is linearly mapped to voltage of analogue audio signal. As far as I know, audio cards are Digital to Analogue Converters, but attiny85 probably doesn't have that.
I'm curious if I can use pwm to play the sound back. Since pwm changes average voltage by changing duty cycle of alternating high and low phases of signal, it most likely would result in the drop of audio quality. Wav sampling rates can differ between 1 HZ and 4.3 GHz according to google. Attiny85 has internal clock with frequency up to 8MHz (which I hope is same for it's pwm generator).
Considering reconfiguring the timer and pwm settings as well as looping in the array, what is the maximum sampling rate of audio i can reliably play? And should i even try to do it with pwm, or there are better options?
Given a system clock of 8 MHz, you can use PWM to generate mono (single-channel) audio.
Consider a PWM period of 1000 clocks, giving you about 10 bit resolution. The sample rate will be 8000 Hz then, which gives you some kind of lo-fi audio.
If you reduce your signal resolution to 8 bits, you'll get 8 MHz / 28 = 31.25 kHz sample rate. This gets near hi-fi.
Synchronize your sample output with the PWM generator, and use an appropriate analogue filter.
Many years ago I built a digital door bell with a sample rate of 8 kHz and 8 bit samples. It played nice sounds in the quality of telephones. The microcontroller was a 8051 derivative and it used an R-2R ladder as DAC.
A simpel sinus can be generated by using a 50% PWM signal and varying the frequency. Given some filtering effect through the speaker, it would mimik a single tone audio signal.
Making more advanced tones (needed for natural sound) quickly gets more complicated and the duty cycle of the signal can also be used to trick the human ear into hearing harmonics. Check out the arduino function tone() for some inspiration.
Be carefull when connecting a small speaker to the Arduino, preferably a transistor/buffer/small amplifyer should be place between the Arduino and the speaker.

ESP8266 analogRead() microphone Input into playable audio

My goal is to record audio using an electret microphone hooked into the analog pin of an esp8266 (12E) and then be able to play this audio on another device. My circuit is:
In order to check the output of the microphone I connected the circuit to the oscilloscope and got this:
In the "gif" above you can see the waves made by my voice when talking to microphone.
here is my code on esp8266:
void loop() {
sensorValue = analogRead(sensorPin);
Serial.print(sensorValue);
Serial.print(" ");
}
I would like to play the audio on the "Audacity" software in order to have an understanding of the result. Therefore, I copied the numbers from the serial monitor and paste it into the python code that maps the data to (-1,1) interval:
def mapPoint(value, currentMin, currentMax, targetMin, targetMax):
currentInterval = currentMax - currentMin
targetInterval = targetMax - targetMin
valueScaled = float(value - currentMin) / float(currentInterval)
return round(targetMin + (valueScaled * targetInterval),5)
class mapper():
def __init__(self,raws):
self.raws=raws.split(" ")
self.raws=[float(i) for i in self.raws]
def mapAll(self):
self.mappeds=[mapPoint(i,min(self.raws),max(self.raws),-1,1) for i in self.raws ]
self.strmappeds=str(self.mappeds).replace(",","").replace("]","").replace("[","")
return self.strmappeds
Which takes the string of numbers, map them on the target interval (-1 ,+1) and return a space (" ") separated string of data ready to import into Audacity software. (Tools>Sample Data Import and then select the text file including the data). The result of importing data from almost 5 seconds voice:
which is about half a second and when I play I hear unintelligible noise. I also tried lower frequencies but there was only noise there, too.
The suspected causes for the problem are:
1- Esp8266 has not the capability to read the analog pin fast enough to return meaningful data (which is probably not the case since it's clock speed is around 100MHz).
2- The way software is gathering the data and outputs it is not the most optimized way (In the loop, Serial.print, etc.)
3- The microphone circuit output is too noisy. (which might be, but as observed from the oscilloscope test, my voice has to make a difference in the output audio. Which was not audible from the audacity)
4- The way I mapped and prepared the data for the Audacity.
Is there something else I could try?
Are there similar projects out there? (which to my surprise I couldn't find anything which was done transparently!)
What can be the right way to do this? (since it can be a very useful and economic method for recording, transmitting and analyzing audio.)
There are many issues with your project:
You do not set a bias voltage on A0. The ADC can only measure voltages between Ground and VCC. When removing the microphone from the circuit, the voltage at A0 should be close to VCC/2. This is usually achieved by adding a voltage divider between VCC and GND made of 2 resistors, and connected directly to A0. Between the cap and A0.
Also, your circuit looks weird... Is the 47uF cap connected directly to the 3.3V ? If that's the case, you should connect it to pin 2 of the microphone instead. This would also indicate that right now your ADC is only recording noise (no bias voltage will do that).
You do not pace you input, meaning that you do not have a constant sampling rate. That is a very important issue. I suggest you set yourself a realistic target that is well within the limits of the ADC, and the limits of your serial port. The transfer rate in bytes/sec of a serial port is usually equal to baud-rate / 8. For 9600 bauds, that's only about 1200 bytes/sec, which means that once converted to text, you max transfer rate drops to about 400 samples per second. This issue needs to be addressed and the max calculated before you begin, as the max attainable overall sample rate is the maximum of the sample rate from the ADC and the transfer rate of the serial port.
The way to grab samples depends a lot on your needs and what you are trying to do with this project, your audio bandwidth, resolution and audio quality requirements for the application and the amount of work you can put into it. Reading from a loop as you are doing now may work with a fast enough serial port, but the quality will always be poor.
The way that is usually done is with a timer interrupt starting the ADC measurement and an ADC interrupt grabbing the result and storing it in a small FIFO, while the main loop transfers from this ADC fifo to the serial port, along the other tasks assigned to the chip. This cannot be done directly with the Arduino libraries, as you need to control the ADC directly to do that.
Here a short checklist of things to do:
Get the full ESP8266 datasheet from Expressif. Look up the actual specs of the ADC, mainly: the sample rates and resolutions available with your oscillator, and also its electrical constraints, at least its input voltage range and input impedance.
Once you know these numbers, set yourself some target, the math needed for successful project need input numbers. What is your application? Do you want to record audio or just detect a nondescript noise? What are the minimum requirements needed for things to work?
Look up in the Arduino documentartion how to set up a timer interrupt and an ADC interrupt.
Look up in the datasheet which registers you'll need to access to configure and run the ADC.
Fix the voltage bias issue on the ADC input. Nothing can work before that's done, and you do not want to destroy your processor.
Make sure the input AC voltage (the 'swing' voltage) is large enough to give you the results you want. It is not unusual to have to amplify a mic signal (with an opamp or a transistor), just for impedance matching.
Then you can start writing code.
This may sound awfully complex for such a small task, but that's what the average day of an embedded programmer looks like.
[EDIT] Your circuit would work a lot better if you simply replaced the 47uF DC blocking capacitor by a series resistor. Its value should be in the 2.2k to 7.6k range, to keep the circuit impedance within the 10k Ohms or so needed for the ADC. This would insure that the input voltage to A0 is within the operating limits of the ADC (GND-3.3V on the NodeMCU board, 0-1V with bare chip).
The signal may still be too weak for your application, though. What is the amplitude of the signal on your scope? How many bits of resolution does that range cover once converted by the ADC? Example, for a .1V peak to peak signal (SIG = 0.1), an ADC range of 0-3.3V (RNG = 3.3) and 10 bits of resolution (RES = 1024), you'll have
binary-range = RES * (SIG / RNG)
= 1024 * (0.1 / 3.3)
= 1024 * .03
= 31.03
A range of 31, which means around Log2(31) (~= 5) useful bits of resolution, is that enough for your application ?
As an aside note: The ADC will give you positive values, with a DC offset, You will probably need to filter the digital output with a DC blocking filter before playback. https://manual.audacityteam.org/man/dc_offset.html

what is audio PCM's frame sync word to identify the beginning position

As title; for some compressed format such as EAC3, AC3 frame starts as a sync word.
So what's PCM (raw audio)'s sync word? How to identify the beginning of a PCM frame?
I met a problem where audio is concatenated by several audio segments and each of them has different frame size. I need to identify the start position.
Thanks in advance.
There is no such concept as a frame in PCM. The concept of a frame is to indicate points of random access. In PCM every single sample is a point of random access, hence start indicators are not required, and there are no standard frame size. It all up to you.
A PCM frame is different from the frames you're describing, in that a frame is just a single sample on all channels. That is, if I'm recording 16-bit stereo PCM audio, each frame is 4 bytes (32 bits) long.
There is no sync word, nor frame header in raw PCM. It's just a stream of data. You need to know the bit depth, channel count, and current offset if you want to sync to it. (Or, you need to do some simple heuristics. For example, apply several different formats and offsets to a small chunk of data and see which one has the least variance/randomness from sample to sample.)

Is interrupt jitter causing the annoying wobble in audio using the mcu's dac?

I had a assignment for college where we needed to play a precompiled wav as integer array through the PWM and DAC. Now, I wanted more of a challenge, so I went out of my way and created a audio dac over usb using the micro controller in question: The STM32F051. It basically listens to my soundcard output using a wasapi loopback recorder, changes the resolution from 16 to 12 bit (since the dac on the stm32 only has a 12 bit resolution) and sends it over using usart using 10x sample rate as baud rate (in my case 960000). All done in C#.
On the microcontroller I simply use a interrupt for usart and push the received data to the dac.
It works pretty well, much better than PWM, and at a decent sample frequency of 48kHz.
But... here it comes.. When there is some (mostly) high pitch symphonic melody it starts to sound "wobbly".
Here a video where you can hear it: https://youtu.be/xD3uTP9etuA?t=88
I read up on the internet a bit about DIY dac's and someone somewhere (don't remember where) mentioned that MCU's in general have interrupt jitter. So may basic question is: Is interrupt jitter actually causing this? If so, are there ways to limit the jitter happening?
Or is this something entirely different?
I am thinking of trying to compact the pcm data send over serial (as said before, resolution of 12 bits, but are sent in packet of 2 8bits forming 16bits, hence twice the samplerate as the baud rate, so my plan is trying to shift 12 bits to the MSB and adding four bits of the next 12 bit value to the current 16 bit variable, hence only needing 12 transfers instead of 16 per 8 samples. Might read upon more efficient ways of compacting data for transport.), put the samples in a buffer and then use another timer that triggers at 48kHz for sending the samples to the dac. Would this concept work? Or would I just waste time?
For code, here is the project: https://github.com/EldinZenderink/SoundOverSerial

gnuradio phase drift of AM demodulation

I am beginning a project using GNUradio and an inexpensive SDR.
http://www.amazon.com/gp/product/B00SXZDUAQ?psc=1&redirect=true&ref_=oh_aui_search_detailpage
One portion of the project requires me to generate a reference audio tone and compare the phase of that tone to demodulated audio.
To simulate this portion of the system, I have generated a simple GNUradio flowchart:
I had some issues with the source and demodulated audio in that they would drift relative to each other. This occurred on the scope sync on the original flowgraph. To aid in troubleshooting I sent the demodulated audio out thru the soundcard’s second channel and monitored both audio streams in addition to the modulated RF on an external oscilloscope:
Initially all seems well but, the demodulated audio drifts in relation to the original source and RF:
My question is: am I doing something wrong in the flowgraph or am I expecting too much performance out of an inexpensive SDR?
Thanks in advance for any insights
You cannot expect to see zero phase drift in anything short of a fully digital simulation, or a fully analog circuit with exactly one oscillator, because no two (physical) oscillators have identical frequencies.
In your case, there are two relevant oscillators involved:
The sample clock in the RTL-SDR unit.
The sample clock in your sound card output.
Within an GNU Radio flowgraph, there is no time reference per se and everything depends on the sources and sinks which are connected to hardware.
The relevant source in your flowgraph is the RTL-SDR hardware; insofar as its oscillator is different from its nominal value (28.8 MHz, as it happens), everything it produces will be off-frequency in an absolute sense (both RF carrier frequencies and audio frequencies of demodulated output).
But you don't actually have an absolute frequency reference; you have the tone produced by your sound card. The sound card has its own oscillator, which determines the rate at which samples are converted to analog signals, and therefore the rate at which samples are consumed from the flowgraph.
Therefore, your reference signal will drift relative to your received and demodulated signal, at a rate determined by the difference in frequency error between the two oscillators.
Additionally, since your sound card will be accepting samples from the flowgraph at a slightly different real-time rate than the RTL-SDR is producing them, you will notice periodic glitches in the audio as the error accumulates and must be dealt with; they will start occurring either immediately (if the source is slower than the sink, requiring the sound card to play silence instead) or after a delay for buffers to hit their maximum size (if the source is faster than the sink, requiring the RTL-SDR to drop some samples).

Resources