If you have audio encoded at 44100Hz that means you have 44100 samples per second. Does this mean 44100 samples/sec for a channel, or for all channels?
For example if a song is stereo and encoded at 44100Hz, are there 44100 sample/sec for both channels (aka: 22050 samples per channel), or does every channel have 44100 samples (aka: every second has 88200 samples, 44100 for channel 1 and 44100 for channel 2).
Yes, the sample rate is independent of the number of channels, so e.g. CD quality audio is stereo 16 bits 44.1 kHz which means that there are two channels each sampled at 44.1 kHz, and the raw data rate is therefore 44100 * 2 * 16 = 1411200 bits/sec = 176400 bytes/sec.
Each channel is 'sampled' separately, so you will have as much values per 'sample frequency' as there are channels.
Data rate for the PCM (uncompressed) audio is
sample_freq * channels * bits_per_sample / 8
Most common are 16 bit samples, so you'll have 44100 * 2 * 2 for your stereo recording at 44100 sample rate.
Related
The two streams I am decoding are an audio stream (adts AAC, 1 channel, 44100, 8-bit, 128bps) and a video stream (H264) which are received in an Mpeg-Ts stream, but I noticed something that doesn't make sense to me when I decode the AAC audio frames and try to line up the audio/video stream timestamps. I'm decoding the PTS for each video and audio frame, however I only get a PTS in the audio stream every 7 frames.
When I decode a single audio frame I get back 1024 samples, always. The frame rate is 30fps, so I see 30 frames each with 1024 samples which comes equals 30,720 samples and not the expected 44,100 samples. This is a problem when computing the timeline as the timestamps on the frames are slightly different between the audio and video streams. It's very close, but since I compute the timestamps via (1024 samples * 1,000 / 44,100 * 10,000 ticks) it's never going to line up exactly with the 30fps video.
Am I doing something wrong here with decoding the ffmpeg audio frames, or misunderstanding audio samples?
And in my particular application, these timestamps are critical as I am trying to line up LTC timestamps which are decoded at the audio frame level, and lining those up with video frames.
FFProbe.exe:
Video:
r_frame_rate=30/1
avg_frame_rate=30/1
codec_time_base=1/60
time_base=1/90000
start_pts=7560698279
start_time=84007.758656
Audio:
r_frame_rate=0/0
avg_frame_rate=0/0
codec_time_base=1/44100
time_base=1/90000
start_pts=7560686278
start_time=84007.625311
I have this data:
Bit speed: 276 kilobytes/seconds
File size: 6.17 MB
Channels: 2
Layer: 3
Frequency: 44100 HZ
How can I retrieve the audio duration in seconds or milliseconds?
You can't. To get the duration you need the sampling rate in samples per second but also the number of channels (mono, stereo, etc.), and the sample length in bytes (1 to 3 usually). And unless it is a raw audio there is also additional data that takes some space. 276kpbs does not help here. If it is a mP3 the file is compressed, you simply can't just by looking at the file size.
I have a video file and I had dumped the video info to a txt file with ffmpeg nearly 3 year ago.
...
Stream #0:1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16, 256 kb/s
Stream #0:2[0x1c1]: Audio: mp2, 48000 Hz, stereo, s16, 256 kb/s
But I found the format changed when I used the update ffprobe (ffprobe version N-78046-g46f67f4 Copyright (c) 2007-2016 the FFmpeg developers).
...
Stream #0:1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16p, 256 kb/s
Stream #0:2[0x1c1]: Audio: mp2, 48000 Hz, stereo, s16p, 256 kb/s
With the same video, its sample format changes to s16p.
I implemented a simple video player which uses ffmpeg. It can play video 3 years ago, but failed to output the correct pcm stream after changing to update ffmpeg. I spent lots time and finally found that the audio should have been s16 instead of s16p. The decoded audio stream works after I added the line before calling avcodec_decode_audio4,
audio_codec_ctx->sample_fmt = AV_SAMPLE_FMT_S16
but it is just a hack. Does anyone encounter this issue? How to make ffmpeg work correctly? Any hint is appreciated. Thanks!
The output format changed. The reason for this is fairly convoluted and technical, but let me try explaining it anyway.
Most audio codecs are structured such that the output of each channel is best reconstructed individually, and the merging of channels (interleaving of a "left" and "right" buffer into an array of samples ordered left0 right0 left1 right1 [etc]) happens at the very end. You can probably imagine that if the encoder wants to deinterleave again, then transcoding of audio involves two redundant operations (interleaving/deinterleaving). Therefore, all decoders where it makes sense were switched to output planar audio (so s16 changed to s16p, where p means planar), where each channel is its own buffer.
So: nowadays, interleaving is done using a resampling library (libswresample) after decoding instead of as an integral part of decoding, and only if the user explicitly wants to do so, rather than automatically/always.
You can indeed set the request sample format to S16 to force decoding to s16 instead of s16p. Consider this a compatibility hack that will at some point be removed for the few decoders for which it does work, and also one that will not work for new decoders. Instead, consider adding libswresample support to your application to convert between whatever is the native output format of the decoder, and the format you want to use for further data processing (e.g. playback using sound card).
When I was studying Cocoa Audio Queue document, I met several terms in audio codec. There are defined in a structure named AudioStreamBasicDescription.
Here are the terms:
1. Sample rate
2. Packet
3. Frame
4. Channel
I known about sample rate and channel. How I was confused by the other two. What do the other two terms mean?
Also you can answer this question by example. For example, I have an dual-channel PCM-16 source with a sample rate 44.1kHz, which means there are 2*44100 = 88200 Bytes PCM data per second. But how about packet and frame?
Thank you at advance!
You are already familiar with the sample rate defintion.
The sampling frequency or sampling rate, fs, is defined as the number of samples obtained in one second (samples per second), thus fs = 1/T.
So for a sampling rate of 44100 Hz, you have 44100 samples per second (per audio channel).
The number of frames per second in video is a similar concept to the number of samples per second in audio. Frames for our eyes, samples for our ears. Additional infos here.
If you have 16 bits depth stereo PCM it means you have 16*44100*2 = 1411200 bits per second => ~ 172 kB per second => around 10 MB per minute.
To the definition in reworded terms from Apple:
Sample: a single number representing the value of one audio channel at one point in time.
Frame: a group of one or more samples, with one sample for each channel, representing the audio on all channels at a single point on time.
Packet: a group of one or more frames, representing the audio format's smallest encoding unit, and the audio for all channels across a short amount of time.
As you can see there is a subtle difference between audio and video frame notions. In one second you have for stereo audio at 44.1 kHz: 88200 samples and thus 44100 frames.
Compressed format like MP3 and AAC pack multiple frames in packets (these packets can then be written in MP4 file for example where they could be efficiently interleaved with video content). You understand that dealing with large packets helps to identify bits patterns for better coding efficiency.
MP3, for example, uses packets of 1152 frames, which are the basic atomic unit of an MP3 stream. PCM audio is just a series of samples, so it can be divided down to the individual frame, and it really has no packet size at all.
For AAC you can have 1024 (or 960) frames per packet. This is described in the Apple document you pointed at:
The number of frames in a packet of audio data. For uncompressed audio, the value is 1. For variable bit-rate formats, the value is a larger fixed number, such as 1024 for AAC. For formats with a variable number of frames per packet, such as Ogg Vorbis, set this field to 0.
In MPEG-based file format a packet is referred to as a data frame (not to be
mingled with the previous audio frame notion). See Brad comment for more information on the subject.
What are the absolute maximum bitrate the standardized MPEG-2 Part 7 AAC (ISO/IEC 13818-7:1997) and MPEG-4 Audio AAC (ISO/IEC 14496-3:1999) can output by the specifications?