What are the absolute maximum bitrate the standardized MPEG-2 Part 7 AAC (ISO/IEC 13818-7:1997) and MPEG-4 Audio AAC (ISO/IEC 14496-3:1999) can output by the specifications?
Related
I have a stream of audio bytes and doing a live stream using HLS. First, I'm converting a few audio bytes to WAV chunks and then converting WAV to AAC. While converting it to AAC by FFmpeg adds an extra 128ms in every chunk. Due to the extra 128ms audio chunk, over time audio length is getting significantly increase compare to original audio length.
I tried to read audio chunk size in multiple of 1024 samples for AAC conversion but it didn't work.
I'm trying to convert some videos (in the different formats, e.g., mp4, mov) which contain one video stream and multiple audio streams into one HLS playlist with multiple audio streams (treated as languages) and only one video stream.
I already browsed a lot of stack threads and tried many different approaches, but I was only able to find answers for creating different HLS playlists with different audios.
Sample scenario which I have to handle:
I have one mov file, containing one video stream and 2 audio streams.
I need to create an HLS playlist from this mov file, which will use this one video stream, but would encode these 2 audio streams as language tracks (so let's say it's ENG and FRA)
Such prepared HLS can be later streamed in the player, and the end user would have a possibility to switch between audio tracks while watching the clip.
What I was able to achieve is to create multiple hls playlists each with different audio track.
ffmpeg -i "file_name.mp4" \
-map 0:v -map 0:a -c:v copy -c:a copy -start_number 0 \
-f hls \
-hls_time 10 \
-hls_playlist_type vod \
-hls_list_size 0 \
-master_pl_name master_playlist_name.m3u8 \
-var_stream_map "v:0,agroup:groupname a:0,agroup:groupname,language:ENG a:1,agroup:groupname" file_name_%v_.m3u8
My biggest issue is that I'm having hard time understanding how -map and -var_stream_map options should be used in my case, or if they even should be used in this scenario.
An example of the result of ffmpeg -i command on the original mov file which should be converted into HLS.
Stream #0:0[0x1](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 1920x1080, 8786 kb/s, 25 fps, 25 tbr, 12800 tbn (default)
Metadata:
handler_name : Apple Video Media Handler
vendor_id : [0][0][0][0]
timecode : 00:00:56:05
Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
Metadata:
handler_name : SoundHandler
vendor_id : [0][0][0][0]
Stream #0:2[0x3](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s
Metadata:
handler_name : SoundHandler
vendor_id : [0][0][0][0]
I also checked this blogpost and I would like to achieve this exact effect, but with video, not with audio.
For example, -var_stream_map "v:0,a:0 v:1,a:0 v:2,a:0" implies that
the audio stream denoted by a:0 is used in all three video renditions.
The stream_map looks fine. However the hls_muxer will not create a valid HLS playlist since it is missing the codec and bitrate information from the input stream since the audio and the video stream are copied (remember -c:v copy -c:a copy) and not parsed / re-encoded. To add those, use the -tag and -b options to specify the properties for all your video and audio streams in HLS.
Example for your video stream:
-tag:v:0 h264 -b:v:0 8786k
I have gotten a set of FLAC (audio) files from a friend. I copied them to my Sonos music library, and got set to enjoy a nice album. Unfortunately, Sonos would not play the files. As a result I have been getting to know ffmpeg.
Sonos' complaint with the FLAC files was that it was "encoded at an unsupported sample rate". With rolling eyes and shaking head, I note that the free VLC media player happily plays these files, but the product I've paid for (Sonos) - does not. But I digress...
ffprobe revealed that the FLAC files contain both an Audio channel and a Video channel:
$ ffprobe -hide_banner -show_streams "/path/to/Myaudio.flac"
Duration: 00:02:23.17, start: 0.000000, bitrate: 6176 kb/s
Stream #0:0: Audio: flac, 176400 Hz, stereo, s32 (24 bit)
Stream #0:1: Video: mjpeg (Progressive), yuvj444p(pc, bt470bg/unknown/unknown), 450x446 [SAR 72:72 DAR 225:223], 90k tbr, 90k tbn, 90k tbc (attached pic)
Metadata:
comment : Cover (front)
Cool! I guess this is how some audio players are able to display the 'album artwork' when they play a song? Note also that the Audio stream is reported at 176400 Hz! Apparently I'm out of touch; I thought that 44.1khz sampling rate effectively removed all of the 'sampling artifacts' we could hear. Anyway, I learned that Sonos would support a max of 48kHz sampling rate, and this (the 176.4kHz rate) is what Sonos was unhappy about. I used ffmpeg to 'dumb it down' for them:
$ ffmpeg -i "/path/to/Myaudio.flac" -sample_fmt s32 -ar 48000 "/path/to/Myaudio48K.flac"
This seemed to work - at least I got a FLAC file that Sonos would play. However, I also got what looks like a warning of some sort:
[swscaler # 0x108e0d000] deprecated pixel format used, make sure you did set range correctly
[flac # 0x7feefd812a00] Frame rate very high for a muxer not efficiently supporting it.
Please consider specifying a lower framerate, a different muxer or -vsync 2
A bit more research turned up this answer which I don't quite understand, and then in a comment says, "not to worry" - at least wrt the swscaler part of the warning.
And that (finally) brings me to my questions:
1.a. What framerate, muxer & other specifications make a graphic compatible with a majority of programs that use the graphic?
1.b. How should I use ffmpeg to modify the Video channel to set these specifications (ref. Q 1.a.)?
2.a. How do I remove the Video channel from the .flac audio file?
2.b. How do I add a Video channel into a .flac file?
EDIT:
I asked the above (4) questions after failing to accomplish a 'direct' conversion (a single ffmpeg command) from FLAC at 176.4 kHz to ALAC (.m4a) at 48 kHz (max supported by Sonos). I reasoned that an 'incremental' approach through a series of conversions might get me there. With the advantage of hindsight, I now see I should have posted my original failed direct conversion incantation... we live and learn.
That said, the accepted answer below meets my final objective to convert a FLAC file encoded at 176.4kHz to an ALAC (.m4a) at 48kHz, and preserve the cover art/video channel.
What framerate, muxer & other specifications make a graphic compatible with a majority of programs that use the graphic?
A cover art is just a single frame so framerate has no relevance in this case. However, you don't want a video stream, it has to remain a single image, so -vsync 0 should be added. Muxer is simply the specific term for the packager as used in media file processing. It is decided by the choice of format e.g. FLAC, WAV..etc. What's important is the codec for the cover art; usually, it's PNG or JPEG. For FLAC, PNG is the default codec.
How do I remove the Video channel from the .flac audio file
ffmpeg -i "/path/to/Myaudio.flac" -vn -c copy "/path/to/Myaudio48K.flac"
(All this does is skip any video in the input and copy everything else)
How do I add a Video channel into a .flac file?
To add cover art to audio-only formats, like MP3, FLAC..etc, the video stream has to have a disposition of attached picture. So,
ffmpeg -i "/path/to/Myaudio.flac" -i coverimage -sample_fmt s32 -ar 48000 -disposition:v attached_pic -vsync 0 "/path/to/Myaudio48K.flac"
For direct conversion to ALAC, use
ffmpeg -i "/path/to/Myaudio.flac" -i coverimage -ar 48000 -c:a alac -disposition:v attached_pic -vsync 0 -c:v png "/path/to/Myaudio48K.m4a"
I have a task to add running text to video stream (or file) on receive. Video need to be run on cubieboard with Armbian. I tested with mpv, with flag --hwdec=vdpau, and the video runs smoother that without it. To add running text I tried to use lavfi-drawtext filter, but when I use it, mpv falls back to software decoding and lag is seen. Here is one of the examples I used:
mpv --hwdec=vdpau Videos/VID* -vf lavfi=[drawtext=fontsize=40:fontcolor=yellow:x=w-50*t:y=h/2:textfile=livetext.txt:reload=1]
And an output from that command with --msg-level=vd=v, it is from my working PC, on cubieboard it also warns about audio/video desync:
Playing: Videos/VID_20180129_120726.mp4
(+) Video --vid=1 (*) (h264 1080x1920 30.000fps)
(+) Audio --aid=1 --alang=eng (*) (aac 2ch 44100Hz)
[vd] Container reported FPS: 30.000000
[vd] Codec list:
[vd] h264 - H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
[vd] h264_crystalhd (h264) - CrystalHD H264 decoder
[vd] h264_cuvid (h264) - Nvidia CUVID H264 decoder
[vd] Opening video decoder h264
[vd] Probing 'vdpau'...
[vd] Trying hardware decoding.
[vd] Selected video codec: h264 (H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10)
Opening video filter: [lavfi graph=drawtext=fontsize=40:fontcolor=yellow:x=w-50*t:y=h/2:textfile=/opt/mpv-text/livetext.txt:reload=1]
[vd] Pixel formats supported by decoder: vdpau vaapi_vld yuv420p
[vd] Codec profile: High (0x64)
[vd] Requesting pixfmt 'vdpau' from decoder.
Using hardware decoding (vdpau).
[vd] Decoder format: 1080x1920 vdpau[yuv420p] bt.709/bt.709/bt.1886/limited CL=mpeg2/4/h264
[ffmpeg] Impossible to convert between the formats supported by the filter 'src' and the filter 'auto_scaler_0'
[lavfi] Can't configure libavfilter graph.
Video filter chain:
[in] 1080x1920 vdpau[yuv420p] bt.709/bt.709/bt.1886/limited SP=1.000000 CL=mpeg2/4/h264
[lavfi] "lavfi.00" 1080x1920 vdpau[yuv420p] bt.709/bt.709/bt.1886/limited SP=1.000000 CL=mpeg2/4/h264 <---
[out] ???
Falling back to software decoding.
[vd] Detected 8 logical cores.
[vd] Requesting 9 threads for decoding.
AO: [pulse] 44100Hz stereo 2ch float
[vd] Decoder format: 1080x1920 yuv420p bt.709/bt.709/bt.1886/limited CL=mpeg2/4/h264
VO: [opengl] 1080x1920 yuv420p
AV: 00:00:03 / 00:00:36 (9%) A-V: 0.000
[vd] Uninit video.
After a long search I doubt that hardware acceleration with mpv here is possible. If so, maybe you could give advice on other tools to achieve that? I am a newbie in this sphere and maybe there is a more efficient way to add running text to video. Thanks.
I found a format of subtitles that can move text (ASS), so that no video filters are needed. Tag \move(x1,y1,x2,y2) can do that, reference to tags below:
http://docs.aegisub.org/3.2/ASS_Tags/
I have a video file and I had dumped the video info to a txt file with ffmpeg nearly 3 year ago.
...
Stream #0:1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16, 256 kb/s
Stream #0:2[0x1c1]: Audio: mp2, 48000 Hz, stereo, s16, 256 kb/s
But I found the format changed when I used the update ffprobe (ffprobe version N-78046-g46f67f4 Copyright (c) 2007-2016 the FFmpeg developers).
...
Stream #0:1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16p, 256 kb/s
Stream #0:2[0x1c1]: Audio: mp2, 48000 Hz, stereo, s16p, 256 kb/s
With the same video, its sample format changes to s16p.
I implemented a simple video player which uses ffmpeg. It can play video 3 years ago, but failed to output the correct pcm stream after changing to update ffmpeg. I spent lots time and finally found that the audio should have been s16 instead of s16p. The decoded audio stream works after I added the line before calling avcodec_decode_audio4,
audio_codec_ctx->sample_fmt = AV_SAMPLE_FMT_S16
but it is just a hack. Does anyone encounter this issue? How to make ffmpeg work correctly? Any hint is appreciated. Thanks!
The output format changed. The reason for this is fairly convoluted and technical, but let me try explaining it anyway.
Most audio codecs are structured such that the output of each channel is best reconstructed individually, and the merging of channels (interleaving of a "left" and "right" buffer into an array of samples ordered left0 right0 left1 right1 [etc]) happens at the very end. You can probably imagine that if the encoder wants to deinterleave again, then transcoding of audio involves two redundant operations (interleaving/deinterleaving). Therefore, all decoders where it makes sense were switched to output planar audio (so s16 changed to s16p, where p means planar), where each channel is its own buffer.
So: nowadays, interleaving is done using a resampling library (libswresample) after decoding instead of as an integral part of decoding, and only if the user explicitly wants to do so, rather than automatically/always.
You can indeed set the request sample format to S16 to force decoding to s16 instead of s16p. Consider this a compatibility hack that will at some point be removed for the few decoders for which it does work, and also one that will not work for new decoders. Instead, consider adding libswresample support to your application to convert between whatever is the native output format of the decoder, and the format you want to use for further data processing (e.g. playback using sound card).