Identify HE-AAC audio sampling frequency - audio

I am parsing mpeg2 and mpeg4 AV files and in these files there are PMT tables that hold the information about all the elementary streams. Some of the streams are audio with HE-AAC.
For those streams, there is a descriptor (in the pmt table) with descriptor tag 0x2B which implies the stream is representing HE-AAC.
How do I extract the sampling frequency for that stream?

Its not in the PMT. Its in the ADTS header on every aac frame.

Related

Problem understanding audio stream number of samples when decoded with ffmpeg

The two streams I am decoding are an audio stream (adts AAC, 1 channel, 44100, 8-bit, 128bps) and a video stream (H264) which are received in an Mpeg-Ts stream, but I noticed something that doesn't make sense to me when I decode the AAC audio frames and try to line up the audio/video stream timestamps. I'm decoding the PTS for each video and audio frame, however I only get a PTS in the audio stream every 7 frames.
When I decode a single audio frame I get back 1024 samples, always. The frame rate is 30fps, so I see 30 frames each with 1024 samples which comes equals 30,720 samples and not the expected 44,100 samples. This is a problem when computing the timeline as the timestamps on the frames are slightly different between the audio and video streams. It's very close, but since I compute the timestamps via (1024 samples * 1,000 / 44,100 * 10,000 ticks) it's never going to line up exactly with the 30fps video.
Am I doing something wrong here with decoding the ffmpeg audio frames, or misunderstanding audio samples?
And in my particular application, these timestamps are critical as I am trying to line up LTC timestamps which are decoded at the audio frame level, and lining those up with video frames.
FFProbe.exe:
Video:
r_frame_rate=30/1
avg_frame_rate=30/1
codec_time_base=1/60
time_base=1/90000
start_pts=7560698279
start_time=84007.758656
Audio:
r_frame_rate=0/0
avg_frame_rate=0/0
codec_time_base=1/44100
time_base=1/90000
start_pts=7560686278
start_time=84007.625311

MPEG Transport Stream Audio data information

I am writing a code to extract AAC audio data from mpeg ts stream. I want to get stream properties like sampling frequency, number of channels, Audio type, Audio profile type etc. from Transport stream, without decoding the actual data. How much of the information will be available from stream?
Also I want to know is there any way to find the total duration of the stream without actually finding the last PTS value in the file
Thanks
AAC frames packed in TS use ADTS headers. Its 7 (or 9) bytes, and very easy to parse. ADTS header format is documented well online.

About definition for terms of audio codec

When I was studying Cocoa Audio Queue document, I met several terms in audio codec. There are defined in a structure named AudioStreamBasicDescription.
Here are the terms:
1. Sample rate
2. Packet
3. Frame
4. Channel
I known about sample rate and channel. How I was confused by the other two. What do the other two terms mean?
Also you can answer this question by example. For example, I have an dual-channel PCM-16 source with a sample rate 44.1kHz, which means there are 2*44100 = 88200 Bytes PCM data per second. But how about packet and frame?
Thank you at advance!
You are already familiar with the sample rate defintion.
The sampling frequency or sampling rate, fs, is defined as the number of samples obtained in one second (samples per second), thus fs = 1/T.
So for a sampling rate of 44100 Hz, you have 44100 samples per second (per audio channel).
The number of frames per second in video is a similar concept to the number of samples per second in audio. Frames for our eyes, samples for our ears. Additional infos here.
If you have 16 bits depth stereo PCM it means you have 16*44100*2 = 1411200 bits per second => ~ 172 kB per second => around 10 MB per minute.
To the definition in reworded terms from Apple:
Sample: a single number representing the value of one audio channel at one point in time.
Frame: a group of one or more samples, with one sample for each channel, representing the audio on all channels at a single point on time.
Packet: a group of one or more frames, representing the audio format's smallest encoding unit, and the audio for all channels across a short amount of time.
As you can see there is a subtle difference between audio and video frame notions. In one second you have for stereo audio at 44.1 kHz: 88200 samples and thus 44100 frames.
Compressed format like MP3 and AAC pack multiple frames in packets (these packets can then be written in MP4 file for example where they could be efficiently interleaved with video content). You understand that dealing with large packets helps to identify bits patterns for better coding efficiency.
MP3, for example, uses packets of 1152 frames, which are the basic atomic unit of an MP3 stream. PCM audio is just a series of samples, so it can be divided down to the individual frame, and it really has no packet size at all.
For AAC you can have 1024 (or 960) frames per packet. This is described in the Apple document you pointed at:
The number of frames in a packet of audio data. For uncompressed audio, the value is 1. For variable bit-rate formats, the value is a larger fixed number, such as 1024 for AAC. For formats with a variable number of frames per packet, such as Ogg Vorbis, set this field to 0.
In MPEG-based file format a packet is referred to as a data frame (not to be
mingled with the previous audio frame notion). See Brad comment for more information on the subject.

What exactly does bitrate mean in an video/audio file?

I use ffmpeg to convert videos from one format to another.
Is bitrate the only parameter which decides the output size of a video/audio file?
Yes, bitrate is essentially what will control the file size (for a given playback duration). It is the number of bits used to represent each second of material.
However, there are some subtleties, e.g. :
a video file encoded at a certain video bitrate probably contains a separate audio stream, with a separately-specified bitrate
most file formats will contain some metadata that won't be counted towards the basic video stream bitrate
sometimes the algorithm will not actually aim to achieve the specified bitrate - for example, using the CRF factor. http://trac.ffmpeg.org/wiki/x264EncodingGuide explains how two-pass would be preferred if targeting a specific file size.
So you may want to do a little experimenting with a particular set of options for a particular file format.
Bitrate describes the quality of an audio or video file.
For example, an MP3 audio file that is compressed at 192 Kbps will have a greater dynamic range and may sound slightly more clear than the same audio file compressed at 128 Kbps. This is because more bits are used to represent the audio data for each second of playback.
Similarly, a video file that is compressed at 3000 Kbps will look better than the same file compressed at 1000 Kbps. Just like the quality of an image is measured in resolution, the quality of an audio or video file is measured by the bitrate.

How to find AAC-LC (non-ADTS) audio packet length

I have AAC-LC audio stream coming directly from audio encoder.
Its a raw stream, No ADTS headers, no container data as I want to stream encoded audio directly as it arrives.(before file gets saved).
I want to determine the frame boundaries/frame lengths/packets lengths in incoming encoded raw AAC stream. (AAC has variable packet lengths.)
Can I search for any fixed frame headers/patterns so that I can determine frame boundaries?
Is it possible with AAC?
Thanks in advance for your valuable inputs.
If you are taking AAC encoded data directly from encoder then it's up to encoder to send frame by frame. It should not send "packets", but single frames. Otherwise I don't see a way you can parse for frames.
I'd first check if it really sends more than one frame at a time?
If yes, then one solution would be to tell encoder to send ADTS header, then parse info from ADTS, and finally strip down ADTS from the frame and stream it as raw.
Does that help?

Resources