I have seen I2S channel pair written at some places. One of the examples is below.
http://www.minidsp.com/images/documents/USBStreamer%20Manual.pdf
What does channel pair mean here? Does channel pair mean left and Right both channels in stereo sound?
Where I see it mentioned in the linked document the I2S data is run 2 channels per line over four data lines. Something like this:
________
FCLK: ___| |________|
D0: Ch1 Ch2
D1: Ch3 Ch4
D2: Ch5 Ch6
D3: Ch7 Ch8
The document page 13 is referring to grouping these into stereo pairs where each pair can be controlled with a single volume slider - as opposed to independent where there are individual sliders for each channel. So the pairs would be (1,2), (3,4), (5,6), (7,8). This is all entirely outside of I2S as volume control is happening by the sender or the reader.
Related
I'm currently trying to learn about how WAV files are processed and stored. Most of the resources I've looked at clearly explain how the head chunk is processed, but not the data (this is the one I've found the most helpful). From the WAV file I'm inspecting I get:
NumChannels = 2
SampleRate = 44100
BitsPerSample = 16
Subchunk2Size = 2056192 (11.65s audio file).
NumSamples = 514048
So from my understanding, 44100 samples are played in a second and each sample is 16-bits. There is a total of 514048 samples in this recording. But what about the number of channels? How does that effect reading the data? The resource I mentioned shows:
But I don't quite understand what this means. Isn't this showing a sample being 32-bit? And what about the right and left channels? Wouldn't they alternate? Why are they in groups of 2 before changing to the other channel?
The diagram is somewhat unclear, but this is what I understand from it, plus the other information you gave:
each ellipse contains 16 bits (two bytes, four hex digits), so one sample;
there are pairs of samples;
the label "right channel samples" points to the right-hand sample of each pair;
similarly, "left channel samples" points to the left-hand samples.
So it looks to me that the left and right channel samples do alternate.
As for the numbering, I guess the intent was to show that the first pair of samples are each "sample 2" in their respective channels, followed by a pair that are "sample 3", and so on. I would have labelled them "sample pair 2" etc.
I was studying about differences in 2,4 GHz and 5 GHz , i could understand the whole concept about speed, the range, the frequency etc...
But I still can understand what is a channel. I got some definitions but doesn't make sense " a wireless channel is a way to fine tune and alter the frequency " Could someone explain please.
The channels are just an agreed way to refer to different regions within the portion of bandwidth for a particular wifi range.
For example, each channel in the 2.4GHz spectrum is 5 MHz from the next one - or more accurately the 'centre' of the channels are that distance apart. See below for a diagram from Wikipedia (https://en.wikipedia.org/wiki/List_of_WLAN_channels):
Its important to note that WiFI needs a certain range each side of the centre frequency of the channels (which again are simply a shorthand for specific frequencies within the range), This is shown above and its easy to see from this how channels can 'overlap' which is a common term used in WiFi also.
I have an FFT output from a microphone and I want to detect a specific animal's howl from that (it howls in a characteristic frequency spectrum). Is there any way to implement a pattern recognition algorithm in Arduino to do that?
I already have the FFT part of it working with 128 samples #2kHz sampling rate.
lookup audio fingerprinting ... essentially you probe the frequency domain output from the FFT call and take a snapshot of the range of frequencies together with the magnitude of each freq then compare this between known animal signal and unknown signal and output a measurement of those differences.
Naturally this difference will approach zero when unknown signal is your actual known signal
Here is another layer : For better fidelity instead of performing a single FFT of the entire audio available, do many FFT calls each with a subset of the samples ... for each call slide this window of samples further into the audio clip ... lets say your audio clip is 2 seconds yet here you only ever send into your FFT call 200 milliseconds worth of samples this gives you at least 10 such FFT result sets instead of just one had you gulped the entire audio clip ... this gives you the notion of time specificity which is an additional dimension with which to derive a more lush data difference between known and unknown signal ... experiment to see if it helps to slide the window just a tad instead of lining up each window end to end
To be explicit you have a range of frequencies say spread across X axis then along Y axis you have magnitude values for each frequency at different points in time as plucked from your audio clips as you vary your sample window as per above paragraph ... so now you have a two dimensional grid of data points
Again to beef up the confidence intervals you will want to perform all of above across several different audio clips of your known source animal howl against each of your unknown signals so now you have a three dimensional parameter landscape ... as you can see each additional dimension you can muster will give more traction hence more accurate results
Start with easily distinguished known audio against a very different unknown audio ... say a 50 Hz sin curve tone for known audio signal against a 8000 Hz sin wave for the unknown ... then try as your known a single strum of a guitar and use as unknown say a trumpet ... then progress to using actual audio clips
Audacity is an excellent free audio work horse of the industry - it easily plots a WAV file to show its time domain signal or FFT spectrogram ... Sonic Visualiser is also a top shelf tool to use
This is not a simple silver bullet however each layer you add to your solution can give you better results ... it is a process you are crafting not a single dimensional trigger to squeeze.
I have created a really basic FFT visualizer using a Teensy microcontroller, a display panel, and a pair of headphone jacks. I used kosme's FFT library for Arduino: https://github.com/kosme/arduinoFFT
Analog audio flows into the headphone input and to a junction where the microcontroller samples it. That junction is also connected to an audio out jack so that audio can be passed to some speakers.
This is all fine and good, but currently I'm only sampling the left audio channel. Any time music is stereo separated, the visualization cannot account for any sound on the right channel. I want to rectify this but I'm not sure whether I should start with hardware or software.
Is there a circuit I should build to mix the left and right audio channels? I figure I could do something like so:
But I'm pretty sure that my schematic is misguided. I included bias voltage to try and DC couple the audio signal so that it will properly ride over the diodes. Making sure that the output matches the input is important to me though.
Or maybe should this best be approached in software? Should I instead just be sampling both channels separately and then doing some math to combine them?
Combining the stereo channels of one end of the fork without combining the other two is very difficult. Working in software is much easier.
If you take two sets of samples, you've doubled the amount of math that the microcontroller needs to do.
But if you take readings from both pins and divide them by two, you can add them together and have one set of samples which represents the 'mono' signal.
Keep in mind that human ears have an uneven response to sound volumes, so a 'medium' volume reading on both pins, summed and halved, will result in a 'lower-medium' value. It's better to divide by 1.5 or 1.75 if you can spare the cycles for more complicated division.
I am trying to do some work with basic Beat Detection (in both C and/or Java) by following the guide from GameDev.net. I understand the logic behind the implementation of the algorithms, however I am confused as to how one would get the "sound amplitude" data for the left and right channels of a song (i.e. mp3 or wav).
For example, he starts with the following assumption:
In this model we will detect sound energy variations by computing the average sound energy of the signal and comparing it to the instant sound energy. Lets say we are working in stereo mode with two lists of values : (an) and (bn). (an) contains the list of sound amplitude values captured every Te seconds for the left channel, (bn) the list of sound amplitude values captured every Te seconds for the right channel.
He then proceeds to manipulate an and bn using his following algorithms. I am wondering how one would do the Signal Processing necessary to get an and bn every Te seconds for both channels, such that I can begin to follow his guide and mess around with some simple Beat Detection in songs.
An uncompressed audio file (a .wav or.aiff for example) is for the most part a long array of samples. Each sample consists of the amplitude at a given point in time. When music is recorded, many of these amplitude samples are taken each second.
For stereo (2-channel) audio files, the samples in the array usually alternate channels: [sample1 left, sample1 right, sample2 left, sample2 right, etc...].
Most audio parsing libraries will already have a way of returning the samples separately for each channel.
Once you have the sample array for each channel, it is easy to find the samples for a particular second, as long as you know the sample rate, or number of samples per second. For example, if the sample rate for your file is 44100 samples per second, and you want to capture the samples in n th second, you would use the part of your vector that is between (n * 44100 ) and ((n + 1) * 44100).