AAC Packet Size - audio

I am working on an M4a file with the following metadata:
Metadata:
major_brand : M4A
minor_version : 0
compatible_brands: M4A mp42isom
creation_time : 2019-08-14T13:45:39.000000Z
iTunSMPB : 00000000 00000840 00000000 00000000000387C0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Duration: 00:00:05.25, start: 0.047891, bitrate: 69 kb/s
Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 65 kb/s (default)
The audio duration = 5246.2585 ms
I am trying to calculate a number of frames using the following formula:
duration * sampling rate / frame size = 5246.2585 * 44.1/1024 = 225.9375 frames
I tried multiple files and it always gives xxx.9357 frames.
However, using FFprobe:
ffprobe -i audio.m4a -show_streams -hide_banner
I am getting:
nb_frames=228
There is always a 2.0625 difference between my calculations and FFprobe output.
Any ideas what I am doing wrong here? How can I accurately calculate the number of frames?

In AAC, there is one packet for every 1024 samples, but each packet affects 2048 samples, and each sample is partly recorded in two packets. Therefore, you if you want to properly represent N packets worth of audio samples, you need to use N+1 packets.
If we think of this as each packet affecting the corresponding 1024 samples as well as the next block of samples, then it means that the first 1024 samples cannot be properly represented, so common practice is to pre-pad the signal with zeros in the encoder. On playback these will be discarded, and that's why the duration of the signal is less than you would expect by counting packets.
For some reason, the common practice is actually to pad out with 2112 samples instead of just 1024. The length of padding isn't actually recorded in the AAC file, and isn't specified in the standard, so everybody just uses 2112 to be compatible with everyone else.
2112 samples is exactly 2.0625 packets.
If you want to learn more about this, the magic google words are "AAC priming"

Related

How to repackage mov/mp4 file into HLS playlist with multiple audio streams

I'm trying to convert some videos (in the different formats, e.g., mp4, mov) which contain one video stream and multiple audio streams into one HLS playlist with multiple audio streams (treated as languages) and only one video stream.
I already browsed a lot of stack threads and tried many different approaches, but I was only able to find answers for creating different HLS playlists with different audios.
Sample scenario which I have to handle:
I have one mov file, containing one video stream and 2 audio streams.
I need to create an HLS playlist from this mov file, which will use this one video stream, but would encode these 2 audio streams as language tracks (so let's say it's ENG and FRA)
Such prepared HLS can be later streamed in the player, and the end user would have a possibility to switch between audio tracks while watching the clip.
What I was able to achieve is to create multiple hls playlists each with different audio track.
ffmpeg -i "file_name.mp4" \
-map 0:v -map 0:a -c:v copy -c:a copy -start_number 0 \
-f hls \
-hls_time 10 \
-hls_playlist_type vod \
-hls_list_size 0 \
-master_pl_name master_playlist_name.m3u8 \
-var_stream_map "v:0,agroup:groupname a:0,agroup:groupname,language:ENG a:1,agroup:groupname" file_name_%v_.m3u8
My biggest issue is that I'm having hard time understanding how -map and -var_stream_map options should be used in my case, or if they even should be used in this scenario.
An example of the result of ffmpeg -i command on the original mov file which should be converted into HLS.
Stream #0:0[0x1](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 1920x1080, 8786 kb/s, 25 fps, 25 tbr, 12800 tbn (default)
Metadata:
handler_name : Apple Video Media Handler
vendor_id : [0][0][0][0]
timecode : 00:00:56:05
Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
Metadata:
handler_name : SoundHandler
vendor_id : [0][0][0][0]
Stream #0:2[0x3](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s
Metadata:
handler_name : SoundHandler
vendor_id : [0][0][0][0]
I also checked this blogpost and I would like to achieve this exact effect, but with video, not with audio.
For example, -var_stream_map "v:0,a:0 v:1,a:0 v:2,a:0" implies that
the audio stream denoted by a:0 is used in all three video renditions.
The stream_map looks fine. However the hls_muxer will not create a valid HLS playlist since it is missing the codec and bitrate information from the input stream since the audio and the video stream are copied (remember -c:v copy -c:a copy) and not parsed / re-encoded. To add those, use the -tag and -b options to specify the properties for all your video and audio streams in HLS.
Example for your video stream:
-tag:v:0 h264 -b:v:0 8786k

Slow audio-video sync drift when merging wav and mp4 with ffmpeg

I have an mp4 file with only a single video stream (no audio) and a wav audio file that I would like to add to the video using ffmpeg. The audio and the video have been recorded simultaneously during a conference, the former from a mixer output on a PC and the latter from a digital videocamera.
I am using this ffmpeg command:
ffmpeg -i incontro3.mp4 -itsoffset 18.39 -i audio_mix.wav -c:v copy -c:a aac final-video.mp4
where I'm using the -itsoffset 18.39 option since I know that 18.39s is the video-audio delay.
The problem I'm experiencing is that in the output file, while the audio is perfectly in sync with the video at the beginning, it slowly drifts out of sync during the movie.
The output if ffprobe on the video file is:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'incontro3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.25.100
Duration: 00:47:22.56, start: 0.000000, bitrate: 888 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 886 kb/s, 25 fps, 25 tbr, 12800 tbn (default)
Metadata:
handler_name : VideoHandler
and the ffprobe output for the audio file is:
Input #0, wav, from 'audio_mix.wav':
Metadata:
track : 5
encoder : Lavf57.25.100
Duration: 00:46:32.20, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s
I'm using the latest ffmpeg Zeranoe windows build git-9591ca7 (2016-05-25).
Thanks in anticipation for any help/ideas!
UPDATE 1: It looks like the problem is upstream the video-audio merging, and could be in the concatenation and conversion of the MTS files generated by the video camera into the mp4 video. I will follow up as I make any progress in understanding...
UPDATE 2: The problem is not in the initial merging of the MTS files generated by the camera. Or, at least, it occurs identically if I merge them with cat or with ffmpeg -f concat
UPDATE 3: Following #Mulvya's suggestion, I observed that the drift rate is constant (at least as far as I can tell judging by eye). I also tried to superimpose the A/V tracks with another software, and the drift is exactly the same, thereby ruling out ffmpeg as culprit. My (bad) feeling is that the issue could be related to the internal clocks of the digital video camera and the laptop used for audio recording running at slightly different rates (see here the report of an identical issue I just found).
Since the drift rate is constant, you can use a combination of FFmpeg filters to retime the audio.
ffmpeg -i audio_mix.wav -af asetrate=44100*(10/9),aresample=44100 retimed.wav
Here, 44100*(10/9) indicates the actual no. of samples that represent 1 second of sound i.e. if after 100 seconds of playback of the original WAV, the audio just heard is the 90th second, then the sample consumption rate should be increased by 10/9. That would make for an unconventional sample rate, so aresample is added to resample it back to a standard rate.

audio sample format s16p, ffmpeg or audio codec bug?

I have a video file and I had dumped the video info to a txt file with ffmpeg nearly 3 year ago.
...
Stream #0:1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16, 256 kb/s
Stream #0:2[0x1c1]: Audio: mp2, 48000 Hz, stereo, s16, 256 kb/s
But I found the format changed when I used the update ffprobe (ffprobe version N-78046-g46f67f4 Copyright (c) 2007-2016 the FFmpeg developers).
...
Stream #0:1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16p, 256 kb/s
Stream #0:2[0x1c1]: Audio: mp2, 48000 Hz, stereo, s16p, 256 kb/s
With the same video, its sample format changes to s16p.
I implemented a simple video player which uses ffmpeg. It can play video 3 years ago, but failed to output the correct pcm stream after changing to update ffmpeg. I spent lots time and finally found that the audio should have been s16 instead of s16p. The decoded audio stream works after I added the line before calling avcodec_decode_audio4,
audio_codec_ctx->sample_fmt = AV_SAMPLE_FMT_S16
but it is just a hack. Does anyone encounter this issue? How to make ffmpeg work correctly? Any hint is appreciated. Thanks!
The output format changed. The reason for this is fairly convoluted and technical, but let me try explaining it anyway.
Most audio codecs are structured such that the output of each channel is best reconstructed individually, and the merging of channels (interleaving of a "left" and "right" buffer into an array of samples ordered left0 right0 left1 right1 [etc]) happens at the very end. You can probably imagine that if the encoder wants to deinterleave again, then transcoding of audio involves two redundant operations (interleaving/deinterleaving). Therefore, all decoders where it makes sense were switched to output planar audio (so s16 changed to s16p, where p means planar), where each channel is its own buffer.
So: nowadays, interleaving is done using a resampling library (libswresample) after decoding instead of as an integral part of decoding, and only if the user explicitly wants to do so, rather than automatically/always.
You can indeed set the request sample format to S16 to force decoding to s16 instead of s16p. Consider this a compatibility hack that will at some point be removed for the few decoders for which it does work, and also one that will not work for new decoders. Instead, consider adding libswresample support to your application to convert between whatever is the native output format of the decoder, and the format you want to use for further data processing (e.g. playback using sound card).

Convert form 30 to 60fps by increasing speed, not duplicating frames, using FFmpeg

I have a video that is incorrectly labelled at 30fps, it is actually 60fps and so looks like it's being played at half speed. The audio is fine, that is, the soundtrack finishes half way through the video clip. I'd like to know how, if possible to fix this, that is double the video speed, making it 60fps and meaning that the audio and video are synced.
The file is H.264 and the audio MPEG-4 AAC.
File details as given by ffmpeg, as requested:
ffmpeg version 0.8.9-6:0.8.9-0ubuntu0.13.10.1, Copyright (c) 2000-2013 the Libav developers
built on Nov 9 2013 19:09:46 with gcc 4.8.1
*** THIS PROGRAM IS DEPRECATED ***
This program is only provided for compatibility and will be removed in a future release. Please use avconv instead.
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './Tignes60fps.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: isommp42
creation_time : 2014-01-13 02:23:09
Duration: 00:08:33.21, start: 0.000000, bitrate: 5690 kb/s
Stream #0.0(eng): Video: h264 (High), yuv420p, 1920x1080 [PAR 1:1 DAR 16:9], 5609 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc
Metadata:
creation_time : 2014-01-13 02:23:09
Stream #0.1(eng): Audio: aac, 48000 Hz, stereo, s16, 156 kb/s
Metadata:
creation_time : 2014-01-13 02:23:09
At least one output file must be specified
Use -vsync drop:
ffmpeg -i input.avi -vcodec copy -vsync drop -r 60 output.avi
Source timestamps will be destroyed and the output muxer will create a new ones based on given frame rate (-r switch).
Okay so here's how I achieved what I wanted.
avconv -i input.mp4 -r 60 -filter:v "setpts=0.5*PTS" output.mp4
This left the audio unchanged, so it now synced up nicely with the video.
This was originally a video that was incorrectly exported as 30fps when really it was 60, so the video was playing at half the speed for twice a long, with the audio track finishing half way through. The above fixed this, sped up the video, without loosing frames, it now plays at 60fps, normal speed and is in sync with the audio.
Credit to rogerdpack for suggesting setpts, but you were very minimal! A fuller answer would have been appreciated!

How do I alter my FFMPEG command to make my HTTP Live Streams more efficient?

I want to reduce the muxing overhead when creating .ts files using FFMPEG.
Im using FFMPEG to create a series of transport stream files used for HTTP live streaming.
./ffmpeg -i myInputFile.ismv \
-vcodec copy \
-acodec copy \
-bsf h264_mp4toannexb \
-map 0 \
-f segment \
-segment_time 10\
-segment_list_size 999999 \
-segment_list output/myVarientPlaylist.m3u8 \
-segment_format mpegts \
output/myAudioVideoFile-%04d.ts
My input is in ismv format and contains a video and audio stream:
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 320x240, 348 kb/s, 29.97 tbr, 10000k tbn, 59.94 tbc
Stream #0:1(und): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 63 kb/s
There is an issues related to muxing that is causing a large amout of overhead to be added to the streams. This is how the issue was described to me for the audio:
So for a given aac stream, the overhead will be 88% (since 200 bytes will map to 2 x 188 byte packets).
For video, the iframe packets are quite large, so they translate nicely into .ts packets, however, the diffs can be as small as an audio packet, therefore they suffer from the same issue.
The solution is to combine several aac packets into one larger stream before packaging them into .ts. Is this possible out of the box with FFMPEG?
It is not possible. Codecs rely on the encapsulating container for framing, which means to signal the start and length of a frame.
Your graphic actually misses an element, which is the PES packet. Your audio frame will be put into a PES packet first (which indicates its length), then the PES packet will be cut into smaller chunks which will be TS packets.
By design you can not start a new PES packet (containing an audio frame in your case) in a TS packet which already contains data. A new PES packet will always start in a new TS packet. Otherwise it would be impossible to start playing mid-stream (broadcast sitation) - it would be impossible to know on which byte in the TS the new PES begins (remember you have missed the beginning of the current PES).
There are some mitigating factors, the FF FF FF padding will probably be compressed by the networking hardware. Also if you are using HTTP (instead of UDP or RDP), gzip compression can be enabled (but I doubt it would help much).
I've fixed the worst problem of syncing the TS output on each frame in http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=75c8d7c2b4972f6ba2cef605949f57322f7c0361 - please try a version after that.

Resources