About audio record sample rate - audio

We want to record stereo audio signals by AudioRecord as the below.
If we set sample rate to 44,100, are both stereo channels recorded
at 44,100Hz or 22,050Hz?
According to our implementation, it seems that half sampling frequency is applied to each channel
AudioRecord audioInputStream = new AudioRecord(Media.Recorder.CAMCORDER,
sampleRate, AudioFormat.CHANNEL_IN_STEREO, AudioFormat.ENCODING_PCM_16BIT,
samplesPerBuffer * bytesPerSample)

The sample rate is constant no matter what the number of channels. So 1 channel at 44.1k you get 44100 total samples per second and with 2 channels you would get 88200 total samples per second.
I don't really know about the API you are using but I can point to one possible area that arises from terminology. The is the difference between a sample and a frame. Usually you consider a sample to be a single value a frame to contain a single sample for each channel. So if you encounter any API that looks something like this: process(double* samples, int numChannels, int numFrames) just beware that the actual number of samples in the buffer is numChannels*numFrames. And misinterpreting something like that could definitely lead to consuming half as many samples as you expect. Also some APIs will confusingly use the term numSamples when they should have used numFrames, etc...

Related

How samples are aligned in the audio file?

I'm trying to better understand how samples are aligned in the audio file.
Let's say we have a 2s audio file with sampling rate = 3.
I think there are three possible ways to align those samples. Looking at the picture below, can you tell me which one is correct?
Also, is this a standard for all audio files or does different formats have different rules?
Cheers!
Sampling rate in audio typically tells you how many samples are in one second, a unit called Hertz. Strictly speaking, the correct answer would be (1), as you have 3 samples within one second. Assuming there's no latency, PCM and other formats dictate that audio starts at 0. Next "cycle" (next second) also starts at zero, same principle like with a clock.
To get total length of the audio (following question in the comment), you should simply take number of samples / rate. Example from a 30s WAV using soxi, one of canonical tools used in the community for sound manipulation:
Input File : 'book_00396_chp_0024_reader_11416_5_door_Freesound_validated_380721_0-door_Freesound_validated_381380_0-9IfN8dUgGaQ_snr10_fileid_1138.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:30.00 = 480000 samples ~ 2250 CDDA sectors
File Size : 960k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
480000 samples / (16000 samples / seconds) = 30 seconds exactly. Citing manual, duration is "Equivalent to number of samples divided by the sample-rate."

What is the bit rate?

I am new to audio programming,
But I am wondering formula of bitRate,
According to wiki https://en.wikipedia.org/wiki/Bit_rate#Audio,
bit rate = sample rate X bit depth X channels
and
sample rate is the number of samples (or snapshots taken) per second obtained by a digital audio device.
bit depth is the number of bits of information in each sample.
So why bit rate = sample rate X bit depth X channels?
From my perspective, if bitDepth = 2 bit, sample rate = 3 HZ
then I can transfer 6 bit data in 1 second
For example:
Sample data = 00 //at 1/3 second.
Sample data = 01 //at 2/3 second.
Sample data = 10 //at 3/3 second.
So I transfer 000110 in 1 second, is that correct logic?
Bit-rate is the expected amount of bits per interval (eg: per second).
Sound cycles are measured in hertz, where 1 hertz == 1 second. So to get full sound data that represents that 1 second of audio, you calculate how many bits are needed to be sent (or for media players, they check the bit-rate in a file-format's settings so they can read & playback correctly).
Why is channels involved (isn't sample rate X bit-depth enough)?
In digital audio the samples are sent for each "ear" (L/R channel). There will always be double the amount of samples in a stereo sound versus if it was mono sound. Usually there is a "flag" to specify if sound is stereo or mono.
Logic Example: (without bit-depth, and assuming 1-bit per sample)...
There is speech "Hello" recorded at 200 samples/sec at bitrate of 100/sec. What happens?
If stereo flag, each ear gets 100 samples per sec (correct total of 200 played)
If mono, audio speech will sound slow by half (since only 100 samples played at expected bit-rate of 100, but remember, a full second was recorded at 200 sample/sec. You get half of "hello" in one second and the other at next second to (== slowed speech).
Taking the above example, you will find these audio gives slow/double speed adventures in your "new to audio programming" experience. The fix will be either setting channels amount or setting bit-rate correctly. Good luck.
The 'sample rate' is the rate at which each channel is sampled.
So 'sample rate X bit depth' will give you the bit rate for a single channel.
You then need to multiply that by the number of channels to get the total bit rate flowing through the system.
For example the CD standard has a sample rate of 44100 samples per second and a bit depth of 16 giving a bit rate of 705600 per channel and a total bit rate of 1411200 bits per seconds for stereo.

can a DSP combine two 48khz audio streams to create a 96khz output

Wireless connections like bluetooth are limited by transmission bandwidth resulting in a limited bitrate and audio sampling frequency.
Can a high definition audio output like 24bit/96khz be created by combining two separate audio streams of 24bit/48khz each, transmitted from a source to receiver speakers/earphones.
I tried to understand how a DSP(digital signal processor) works, but I am unable to find the exact technical words that explain this kind of audio splitting and re-combining technique for increasing the audio resolution
No, you would have to upsample the two original audio streams to 96 kHz. Combining two audio streams will not increase audio resolution; all you're really doing is summing two streams together.
You'll probably want to read this free DSP resource for more information.
Here is a simple construction which could be used to create two audio streams at 24bit/48kHz from a higher resolution 24bit/96kHz stream, which could later be recombined to recreate a single audio stream at 24bit/96kHz.
Starting with an initial high resolution source at 24bit/96kHz {x[0],x[1],x[2],...}:
Take every even sample of the source (i.e. {x[0],x[2],x[4],...} ), and send it over your first 24bit/48kHz channel (i.e. producing the stream y1 such that y1[0]=x[0], y1[1]=x[2], ...).
At the same time, take every odd sample {x[1],x[3],x[5],...} of the source, and send it over your second 24bit/48kHz channel (i.e. producing the stream y2 such that y2[0]=x[1], y2[1]=x[3], ...).
At the receiving end, you should then be able to reconstruct the original 24bit/96kHz audio signal by interleaving the samples from your first and second channel. In other words you would be recreating an output stream out with:
out[0] = y1[0]; // ==x[0]
out[1] = y2[0]; // ==x[1]
out[2] = y1[1]; // ==x[2]
out[3] = y2[1]; // ==x[3]
out[4] = y1[2]; // ==x[4]
out[5] = y2[2]; // ==x[5]
...
That said, transmitting those two streams of 24bit/48kHz would require an effective bandwidth of 2*24bit*48000kHz = 2304kbps, which is exactly the same as transmitting one stream of 24bit/96kHz. So, while this allows you to fit the audio stream in channels of fixed bandwidth, you are not reducing the total bandwidth requirement this way.
Could please you provide you definition of "combining". Based on the data rates, it seems like you want to do a multiplex (combining two mono channels into a stereo channel). If the desire is to "add" two channels together (two monos into a single mono or two stereo channels into one stereo), then you should not have to increase your sampling rate (you are adding two band limited signals, increasing the sampling rate is not necessary).

How to get amplitude of an audio stream in an AudioGraph to build a SoundWave using Universal Windows?

I want to built a SoundWave sampling an audio stream.
I read that a good method is to get amplitude of the audio stream and represent it with a Polygon. But, suppose we have and AudioGraph with just a DeviceInputNode and a FileOutpuNode (a simple recorder).
How can I get the amplitude from a node of the AudioGraph?
What is the best way to periodize this sampling? Is a DispatcherTimer good enough?
Any help will be appreciated.
First, everything you care about is kind of here:
uwp AudioGraph audio processing
But since you have a different starting point, I'll explain some more core things.
An AudioGraph node is already periodized for you -- it's generally how audio works. I think Win10 defaults to periods of 10ms and/or 20ms, but this can be set (theoretically) via the AudioGraphSettings.DesiredSamplesPerQuantum setting, with the AudioGraphSettings.QuantumSizeSelectionMode = QuantumSizeSelectionMode.ClosestToDesired; I believe the success of this functionality actually depends on your audio hardware and not the OS specifically. My PC can only do 480 and 960. This number is how many samples of the audio signal to accumulate per channel (mono is one channel, stereo is two channels, etc...), and this number will also set the callback timing as a by-product.
Win10 and most devices default to 48000Hz sample rate, which means they are measuring/output data that many times per second. So with my QuantumSize of 480 for every frame of audio, i am getting 48000/480 or 100 frames every second, which means i'm getting them every 10 milliseconds by default. If you set your quantum to 960 samples per frame, you would get 50 frames every second, or a frame every 20ms.
To get a callback into that frame of audio every quantum, you need to register an event into the AudioGraph.QuantumProcessed handler. You can directly reference the link above for how to do that.
So by default, a frame of data is stored in an array of 480 floats from [-1,+1]. And to get the amplitude, you just average the absolute value of this data.
This part, including handling multiple channels of audio, is explained more thoroughly in my other post.
Have fun!

About definition for terms of audio codec

When I was studying Cocoa Audio Queue document, I met several terms in audio codec. There are defined in a structure named AudioStreamBasicDescription.
Here are the terms:
1. Sample rate
2. Packet
3. Frame
4. Channel
I known about sample rate and channel. How I was confused by the other two. What do the other two terms mean?
Also you can answer this question by example. For example, I have an dual-channel PCM-16 source with a sample rate 44.1kHz, which means there are 2*44100 = 88200 Bytes PCM data per second. But how about packet and frame?
Thank you at advance!
You are already familiar with the sample rate defintion.
The sampling frequency or sampling rate, fs, is defined as the number of samples obtained in one second (samples per second), thus fs = 1/T.
So for a sampling rate of 44100 Hz, you have 44100 samples per second (per audio channel).
The number of frames per second in video is a similar concept to the number of samples per second in audio. Frames for our eyes, samples for our ears. Additional infos here.
If you have 16 bits depth stereo PCM it means you have 16*44100*2 = 1411200 bits per second => ~ 172 kB per second => around 10 MB per minute.
To the definition in reworded terms from Apple:
Sample: a single number representing the value of one audio channel at one point in time.
Frame: a group of one or more samples, with one sample for each channel, representing the audio on all channels at a single point on time.
Packet: a group of one or more frames, representing the audio format's smallest encoding unit, and the audio for all channels across a short amount of time.
As you can see there is a subtle difference between audio and video frame notions. In one second you have for stereo audio at 44.1 kHz: 88200 samples and thus 44100 frames.
Compressed format like MP3 and AAC pack multiple frames in packets (these packets can then be written in MP4 file for example where they could be efficiently interleaved with video content). You understand that dealing with large packets helps to identify bits patterns for better coding efficiency.
MP3, for example, uses packets of 1152 frames, which are the basic atomic unit of an MP3 stream. PCM audio is just a series of samples, so it can be divided down to the individual frame, and it really has no packet size at all.
For AAC you can have 1024 (or 960) frames per packet. This is described in the Apple document you pointed at:
The number of frames in a packet of audio data. For uncompressed audio, the value is 1. For variable bit-rate formats, the value is a larger fixed number, such as 1024 for AAC. For formats with a variable number of frames per packet, such as Ogg Vorbis, set this field to 0.
In MPEG-based file format a packet is referred to as a data frame (not to be
mingled with the previous audio frame notion). See Brad comment for more information on the subject.

Resources