adding delay to an audio stream of a live feed in ffmpeg - audio

I am currently capturing video via a Blackmagic decklink card on macOS. My audio and video are out of sync. The audio is ahead about a second. I suspect the video is slower on account of encoding latency. My solution is to retard the audio using the ffmpeg adelay filter. I originally added a -af "adelay=1000|1000" to my command to delay the audio by 1000ms but I found that this audio filter did nothing. Consequently, I tried to build a complex_filter, but this failed. My command is producing too many streams that ffmpeg can't route them to the proper rtp endpoint. So what is the best way to delay the audio and can I select which streams map to rtp endpoints?
ffmpeg \
-format_code 23ps \
-f decklink \
-i "DeckLink HD Extreme 3" \
-filter_complex "[0:a] adelay=2s|2s [delayed]" \
-map [delayed] -map 0:v \
-r 24 \
-g 1 \
-s 1920x1080 \
-quality realtime \
-speed 8 \
-threads 8 \
-row-mt 1 \
-tile-columns 2 \
-frame-parallel 1 \
-qmin 30 \
-qmax 35 \
-b:v 2000k \
-pix_fmt yuv420p \
-c:v libvpx-vp9 \
-strict experimental \
-an -f rtp rtp://myurl.com:5004?pkt_size=1300 \
-c:a libopus \
-b:a 128k \
-vn -f rtp rtp://myurl.com:5002?pkt_size=1300
adding a full log when running the command with out any delay:
-filter_complex "[0:a] adelay=2s|2s [delayed]" \
-map [delayed] -map 0:v \
ffmpeg version N-97362-g889ad93c88 Copyright (c) 2000-2020 the FFmpeg developers
built with Apple LLVM version 9.0.0 (clang-900.0.39.2)
configuration: --prefix=/usr/local --pkg-config-flags=--static --extra-cflags='-fno-stack-check -I/Users/admin/Documents/ffmpeg_build/include -I/Users/admin/Documents/BDS/Mac/include' --extra-ldflags=-L/Users/admin/Documents/ffmpeg_build/lib --extra-libs='-lpthread -lm' --bindir=/Users/admin/Documents/ffmpeg_build/bin --enable-gpl --enable-libass --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-nonfree --enable-decklink
libavutil 56. 42.102 / 56. 42.102
libavcodec 58. 80.100 / 58. 80.100
libavformat 58. 42.100 / 58. 42.100
libavdevice 58. 9.103 / 58. 9.103
libavfilter 7. 77.101 / 7. 77.101
libswscale 5. 6.101 / 5. 6.101
libswresample 3. 6.100 / 3. 6.100
libpostproc 55. 6.100 / 55. 6.100
[decklink # 0x7fcfb2000000] Found Decklink mode 1920 x 1080 with rate 23.98
[decklink # 0x7fcfb2000000] Frame received (#2) - No input signal detected - Frames dropped 1
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, decklink, from 'DeckLink HD Extreme 3':
Duration: N/A, start: 0.000000, bitrate: 797002 kb/s
Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
Stream #0:1: Video: rawvideo (UYVY / 0x59565955), uyvy422(progressive), 1920x1080, 795466 kb/s, 23.98 tbr, 1000k tbn, 1000k tbc
[decklink # 0x7fcfb2000000] Frame received (#3) - Input returned - Frames dropped 2
Stream mapping:
Stream #0:1 -> #0:0 (rawvideo (native) -> vp9 (libvpx-vp9))
Stream #0:0 -> #1:0 (pcm_s16le (native) -> opus (libopus))
Press [q] to stop, [?] for help
[libvpx-vp9 # 0x7fcfb180d200] v1.8.2
Output #0, rtp, to 'rtp://myurl.com.com:5004?pkt_size=1300':
Metadata:
encoder : Lavf58.42.100
Stream #0:0: Video: vp9 (libvpx-vp9), yuv420p, 1920x1080, q=30-35, 2000 kb/s, 24 fps, 90k tbn, 24 tbc
Metadata:
encoder : Lavc58.80.100 libvpx-vp9
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
Output #1, rtp, to 'rtp://myrul.com:5002?pkt_size=1300':
Metadata:
encoder : Lavf58.42.100
Stream #1:0: Audio: opus (libopus), 48000 Hz, stereo, s16, 128 kb/s
Metadata:
encoder : Lavc58.80.100 libopus
SDP:
v=0
o=- 0 0 IN IP4 127.0.0.1
s=No Name
t=0 0
a=tool:libavformat 58.42.100
m=video 5004 RTP/AVP 96
c=IN IP4 54.183.58.143
b=AS:2000
a=rtpmap:96 VP9/90000
m=audio 5002 RTP/AVP 97
c=IN IP4 54.183.58.143
b=AS:128
a=rtpmap:97 opus/48000/2
a=fmtp:97 sprop-stereo=1
frame= 434 fps= 24 q=0.0 size= 37063kB time=00:00:18.09 bitrate=16780.7kbits/s speed=1.01x

The -map option is positional and belongs to the first output URL immediately after it. So, the delayed audio should be mapped after the first output and before the 2nd output URL.
ffmpeg \
-format_code 23ps \
-f decklink \
-i "DeckLink HD Extreme 3" \
-filter_complex "[0:a] adelay=2s|2s [delayed]" \
-map 0:v \
-r 24 \
-g 1 \
-s 1920x1080 \
-quality realtime \
-speed 8 \
-threads 8 \
-row-mt 1 \
-tile-columns 2 \
-frame-parallel 1 \
-qmin 30 \
-qmax 35 \
-b:v 2000k \
-pix_fmt yuv420p \
-c:v libvpx-vp9 \
-strict experimental \
-an -f rtp rtp://myurl.com:5004?pkt_size=1300 \
-map [delayed]
-c:a libopus \
-b:a 128k \
-vn -f rtp rtp://myurl.com:5002?pkt_size=1300

Related

FFMPEG: Specifying Output Stream Type When Combing Multiple Filters

I currently have 3 separate ffmpeg commands that do the following:
Overlay a watermark on a video: ffmpeg -i samplegreen.webm -i foregrounds/myimage.png -r 30 -filter_complex "overlay=(W-w)/2:H-h" -af "adelay=700" output.mp4
Overlay the results of 1) onto a beach video: ffmpeg -i backgrounds/beachsunsetmp4.mp4 -i output.mp4 -filter_complex "[1:v]chromakey=0x005d0b:0.1485:0.03[ckout];[0:v][ckout]overlay[o]" -map [o] -map 1:a -shortest somefolder/sample_video.mp4
Merge the audio of the results of 2) with another audio file: ffmpeg -i somefolder/sample_video.mp4 -i backgrounds/beachsunsetmp4.mp3 -filter_complex '[0:a][1:a]amerge=inputs=2[a]' -map 0:v -map '[a]' -c:v copy -ac 2 -shortest anotherfolder/sample_video.mp4
Now, this all works as intended, however, I was looking into attempting to combine them all into a single command, combining all the filters, like so:
ffmpeg -i samplegreen.webm -i foregrounds/myimage.png -r 30 -i backgrounds/beachsunsetmp4.mp4 -i backgrounds/beachsunsetmp4.mp3 -filter_complex \
"[0]overlay=(W-w)/2:H-h[output_1]; \
[output_1]chromakey=0x005d0b:0.1485:0.03[ckout]; \
[2:v][ckout]overlay[output_2]; \
[output_2][3:a] amerge=inputs=2 [output_3]" \
-af "adelay=700" -map [output_3] shortest final.mp4
It fails with the following error (Media type mismatch between the 'Parsed_overlay_2' filter output pad 0 (video) and the 'Parsed_amerge_3' filter input pad 0 (audio)):
ffmpeg version 4.3.2 Copyright (c) 2000-2021 the FFmpeg developers
built with Apple clang version 11.0.0 (clang-1100.0.33.17)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3.2_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox
libavutil 56. 51.100 / 56. 51.100
libavcodec 58. 91.100 / 58. 91.100
libavformat 58. 45.100 / 58. 45.100
libavdevice 58. 10.100 / 58. 10.100
libavfilter 7. 85.100 / 7. 85.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 7.100 / 5. 7.100
libswresample 3. 7.100 / 3. 7.100
libpostproc 55. 7.100 / 55. 7.100
Input #0, matroska,webm, from 'samplegreen.webm':
Metadata:
encoder : Chrome
Duration: N/A, start: 0.000000, bitrate: N/A
Stream #0:0(eng): Video: vp8, yuv420p(progressive), 1280x720, SAR 1:1 DAR 16:9, 1k tbr, 1k tbn, 1k tbc (default)
Metadata:
alpha_mode : 1
Stream #0:1(eng): Audio: opus, 48000 Hz, mono, fltp (default)
Input #1, png_pipe, from 'foregrounds/myimage.png':
Duration: N/A, bitrate: N/A
Stream #1:0: Video: png, rgba(pc), 350x86, 25 tbr, 25 tbn, 25 tbc
Input #2, mov,mp4,m4a,3gp,3g2,mj2, from 'backgrounds/beachsunsetmp4.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: mp42mp41
creation_time : 2021-02-16T18:24:40.000000Z
Duration: 00:00:32.53, start: 0.000000, bitrate: 3032 kb/s
Stream #2:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 1280x720, 3027 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)
Metadata:
creation_time : 2021-02-16T18:24:40.000000Z
handler_name : ?Mainconcept Video Media Handler
encoder : AVC Coding
[mp3 # 0x7f86cf809000] Estimating duration from bitrate, this may be inaccurate
Input #3, mp3, from 'backgrounds/beachsunsetmp4.mp3':
Metadata:
date : 2021-02-18 06:49
id3v2_priv.XMP : <?xpacket begin="\xef\xbb\xbf" id="W5M0MpCehiHzreSzNTczkc9d"?>\x0a<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 6.0-c003 79.164527, 2020/10/15-17:48:32 ">\x0a <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">\x0a <rdf
Duration: 00:00:32.60, start: 0.000000, bitrate: 132 kb/s
Stream #3:0: Audio: mp3, 48000 Hz, stereo, fltp, 128 kb/s
[Parsed_overlay_2 # 0x7f86cd4039c0] Media type mismatch between the 'Parsed_overlay_2' filter output pad 0 (video) and the 'Parsed_amerge_3' filter input pad 0 (audio)
[AVFilterGraph # 0x7f86cd402a40] Cannot create the link overlay:0 -> amerge:0
Error initializing complex filters.
Invalid argument
As far as I can tell, the issue is that the filter, amerge, wants 2 audio streams. Normally, I could take the input stream argument (which is a video), and make it use the audio by doing something like [0:a][1:a]amerge=inputs=2[results]. However, since my input stream is the output of a preceding filter, that doesn't seem to work (i.e. [output_2:a]). It bombs out with:
[matroska,webm # 0x7fecca000000] Invalid stream specifier: output_2:a.
Last message repeated 1 times
Stream specifier 'output_2:a' in filtergraph description [0]overlay=(W-w)/2:H-h[output_1]; [output_1]chromakey=0x005d0b:0.1485:0.03[ckout]; [2:v][ckout]overlay[output_2]; [output_2:a][3:a] amerge=inputs=2 [output_3] matches no streams.
So all of that said... Is there a way to specify that I'd like to use the audio stream from the output of a preceding filter? Or any other ways to combine all of these filters into a single command?
Thanks.
Any help would be greatly appreciated!
Except for a few filters like concat, a filter will take either only video inputs or only audio.
Here's the combined command.
ffmpeg \
-i samplegreen.webm \
-i foregrounds/myimage.png \
-i backgrounds/beachsunsetmp4.mp4 \
-i backgrounds/beachsunsetmp4.mp3 \
-filter_complex \
"[0][1]overlay=(W-w)/2:H-h,chromakey=0x005d0b:0.1485:0.03[ckout]; \
[2][ckout]overlay=shortest=1[v]; \
[0]adelay=700:all=1[0a]; \
[0a][3]amerge=inputs=2[a]" \
-map '[v]' -map '[a]' \
-shortest -r 30 -ac 2 \
output.mp4

ffmpeg change the order of the output

i try to change the order of the output of 3 inputs (2 audio + 1 video)
this is my command:
/usr/bin/ffmpeg -async 1 \
-f pulse -i alsa_output.pci-0000_00_1b.0.analog-stereo.monitor \
-f pulse -i alsa_input.pci-0000_00_1b.0.analog-stereo \
-f x11grab -video_size 1920x1080 -framerate 8 -i :0.0 \
-filter_complex amix=inputs=2 \
-c:a aac -b:a 128k \
-c:v h264_nvenc -b:v 1500k -maxrate 1500k -minrate 1500k \
-override_ffserver -g 16 http://10.100.102.109:8090/feed1.ffm
this command works but, the first output is audio , therefore my third app cant view this output
this is my output:
Stream mapping:
Stream #0:0 (pcm_s16le) -> amix:input0 (graph 0)
Stream #1:0 (pcm_s16le) -> amix:input1 (graph 0)
amix (graph 0) -> Stream #0:0 (aac)
Stream #2:0 -> #0:1 (rawvideo (native) -> h264 (h264_nvenc))
Press [q] to stop, [?] for help
-async is forwarded to lavfi similarly to -af aresample=async=1:min_hard_comp=0.100000:first_pts=0.
Last message repeated 1 times
Output #0, ffm, to 'http://10.100.102.109:8090/feed1.ffm':
Metadata:
creation_time : now
encoder : Lavf57.83.100
Stream #0:0: Audio: aac (LC), 48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
encoder : Lavc57.107.100 aac
Stream #0:1: Video: h264 (h264_nvenc) (Main), bgr0, 1920x1080, q=-1--1, 1500 kb/s, 8 fps, 1000k tbn, 8 tbc
Metadata:
encoder : Lavc57.107.100 h264_nvenc
Side data:
cpb: bitrate max/min/avg: 1500000/0/1500000 buffer size: 3000000 vbv_delay: -1
****how can i replace the output that the video will be first?****
(when i do this command with 1 audio and 1 video inputs, the output is fine, the video is first , and the third part App can view it)
i spent a lot of hours on it please help me.
tnx a lot ...
In the absence of mapping, output streams from complex filtergraphs will be ordered before other streams. So, add a label to the filter_complex output and map in the order required.
Use
/usr/bin/ffmpeg -async 1 \
-f pulse -i alsa_output.pci-0000_00_1b.0.analog-stereo.monitor \
-f pulse -i alsa_input.pci-0000_00_1b.0.analog-stereo \
-f x11grab -video_size 1920x1080 -framerate 8 -i :0.0 \
-filter_complex "amix=inputs=2[a]" \
-map 2:v -map '[a]' \
-c:a aac -b:a 128k \
-c:v h264_nvenc -b:v 1500k -maxrate 1500k -minrate 1500k \
-override_ffserver -g 16 http://10.100.102.109:8090/feed1.ffm

FFmpeg split filter with with audio filtering throws errors

When using FFmpeg's split filter for video tracks, I want to filter audio track as well. I tested asplit but not sure where to use it in the filter chain.
When running this command:ffmpeg -y -probesize 100M -analyzeduration 5000000 -hide_banner -i $input -i $logo \
-filter_complex "[0:a]aformat=channel_layouts=stereo,aresample=async=1000[a1];[0:v]overlay=20:20,drawtext=fontfile=$font:text='some text':fontcolor=c1ff30:fontsize=50:x=250:y=100,split=3[v1][v2][v3];[v1]setpts=PTS-STARTPTS,yadif=0:-1:0,scale=w=640:h=360:force_original_aspect_ratio=decrease:sws_dither=ed:flags=lanczos,setdar=16/9[v1];[v2]setpts=PTS-STARTPTS,yadif=0:-1:0,scale=w=1024:h=576:force_original_aspect_ratio=decrease:sws_dither=ed:flags=lanczos,setdar=16/9[v2];[v3]setpts=PTS-STARTPTS,yadif=0:-1:0,scale=w=1600:h=900:force_original_aspect_ratio=decrease:sws_dither=ed:flags=lanczos,setdar=16/9[v3]" \
-map "[v1]" -map "[a1]" -c:a libfdk_aac -ac 2 -b:a 128k -ar 48000 -c:v libx264 -crf 23 -maxrate 550k -bufsize 1100k -bsf:v h264_mp4toannexb -forced-idr 1 -sc_threshold 0 -r 25 -g 50 -keyint_min 50 -preset medium -profile:v main -level 3.1 -coder 1 -pix_fmt yuv420p -flags +loop+mv4+cgop -flags2 +local_header -movflags faststart -cmp chroma -hls_time 6 -hls_playlist_type vod /dir/1.m3u8 \
-map "[v2]" -map "[a1]" -c:a libfdk_aac -ac 2 -b:a 128k -ar 48000 -c:v libx264 -crf 23 -maxrate 1400k -bufsize 2800k -bsf:v h264_mp4toannexb -forced-idr 1 -sc_threshold 0 -r 25 -g 50 -preset medium -profile:v main -level 4 -coder 1 -pix_fmt yuv420p -flags +loop+mv4+cgop -flags2 +local_header -movflags faststart -cmp chroma -keyint_min 50 -hls_time 6 -hls_playlist_type vod /dir/2.m3u8 \
-map "[v3]" -map "[a1]" -c:a libfdk_aac -ac 2 -b:a 128k -ar 48000 -c:v libx264 -crf 23 -maxrate 3100k -bufsize 6200k -bsf:v h264_mp4toannexb -forced-idr 1 -sc_threshold 0 -r 25 -g 50 -preset medium -profile:v high -level 3.1 -coder 1 -pix_fmt yuv420p -flags +loop+mv4+cgop -flags2 +local_header -movflags faststart -cmp chroma -keyint_min 50 -hls_time 6 -hls_playlist_type vod /dir/3.m3u8
FFmpeg throws this error:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/Volumes/aaa/bbb/file.mov':
Metadata:
major_brand : qt
minor_version : 512
compatible_brands: qt
encoder : Lavf58.20.100
Duration: 00:00:10.00, start: 0.000000, bitrate: 117945 kb/s
Stream #0:0(eng): Video: prores (apcn / 0x6E637061), yuv422p10le(tv, bt709, top coded first (swapped)), 1920x1080, 115636 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 10k tbn, 10k tbc (default)
Metadata:
handler_name : Telestream, LLC Telestream Media Framework - Local 99.99.999999
encoder : Apple ProRes 422
timecode : 01:25:44:05
Stream #0:1(eng): Audio: pcm_s24le (in24 / 0x34326E69), 48000 Hz, stereo, s32 (24 bit), 2304 kb/s (default)
Metadata:
handler_name : Telestream, LLC Telestream Media Framework - Local 99.99.999999
Stream #0:2(eng): Data: none (tmcd / 0x64636D74), 0 kb/s
Metadata:
handler_name : Telestream, LLC Telestream Media Framework - Local 99.99.999999
timecode : 01:25:44:05
Input #1, png_pipe, from '/Volumes/aaa/bbb/logo.png':
Duration: N/A, bitrate: N/A
Stream #1:0: Video: png, rgba(pc), 1920x1080 [SAR 2835:2835 DAR 16:9], 25 tbr, 25 tbn, 25 tbc
Output with label 'a1' does not exist in any defined filter graph, or was already used elsewhere.
When removing the audio filtering ([0:a]aformat=channel_layouts=stereo,aresample=async=1000[a1]) and mapping 0:a as audio, the command runs fine.
What am I missing?
Filtergraph outputs can be used only once. You'll have to clone the audio output for multiple use.
First,
[0:a]aformat=channel_layouts=stereo,aresample=async=1000,asplit=3[a1][a2][a3]
and then map a1, a2, a3 as required.

Concatenate audio with image and video using ffmpeg

I have 1 image, 1 audio file and 1 video. I would like to merge all of them to make a video which will
show the image and play audio file for the first 10s
play the video file
here is what I was trying to do so far.
ffmpeg \
-loop 1 -framerate 24 -t 10 -i item1.jpg \
-i "https://audio-ssl.itunes.apple.com/apple-assets-us-std-000001/Music/66/58/f7/mzi.eoocfriy.aac.p.m4a" \
-i item4.mp4 \
-filter_complex \
"[0]scale=432:432,setdar=1[img1]; \
[1]volume=1[aud1]; \
[2]scale=432:432,setdar=1[vid1]; \
[img1][aud1][vid1] concat=n=3:v=1:a=1" \
outputfile.mp4
I got the error:
[Parsed_setdar_4 # 0x3063780] Media type mismatch between the
'Parsed_setdar_4' filter output pad 0 (video) and the
'Parsed_concat_6' filter input pad 1 (audio) [AVFilterGraph #
0x30479a0] Cannot create the link setdar:0 -> concat:1 Error
initializing complex filters. Invalid argument
I tried to googled but still cannot figure out what I am doing wrong?
Updated:
I ran the following command:
ffmpeg \
-loop 1 -framerate 24 -t 10 -i item1.jpg \
-t 10 -i "https://audio-ssl.itunes.apple.com/apple-assets-us-std-000001/Music/66/58/f7/mzi.eoocfriy.aac.p.m4a" \
-i item4.mp4 \
-f lavfi -t 1 -i anullsrc \
-filter_complex \
"[0]scale=432:432,setsar=1[img1]; \
[2]scale=432:432,setsar=1[vid1]; \
[img1][1][vid1][3] concat=n=2:v=1:a=1" \
outputfile.mp4
and got the following error:
ffmpeg version 3.3.3 Copyright (c) 2000-2017 the FFmpeg developers
built with gcc 4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
configuration: --extra-libs=-ldl --prefix=/opt/ffmpeg --mandir=/usr/share/man --enable-avresample --disable-debug --enable-nonfree --enable-gpl --enable-version3 --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-decoder=amrnb --disable-decoder=amrwb --enable-libpulse --enable-libfreetype --enable-gnutls --disable-ffserver --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-libvorbis --enable-libtheora --enable-libmp3lame --enable-libopus --enable-libvpx --enable-libspeex --enable-libass --enable-avisynth --enable-libsoxr --enable-libxvid --enable-libvidstab --enable-libwavpack --enable-nvenc --enable-libzimg
libavutil 55. 58.100 / 55. 58.100
libavcodec 57. 89.100 / 57. 89.100
libavformat 57. 71.100 / 57. 71.100
libavdevice 57. 6.100 / 57. 6.100
libavfilter 6. 82.100 / 6. 82.100
libavresample 3. 5. 0 / 3. 5. 0
libswscale 4. 6.100 / 4. 6.100
libswresample 2. 7.100 / 2. 7.100
libpostproc 54. 5.100 / 54. 5.100
Input #0, image2, from 'item1.jpg':
Duration: 00:00:00.04, start: 0.000000, bitrate: 8365 kb/s
Stream #0:0: Video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 432x432 [SAR 1:1 DAR 1:1], 24 fps, 24 tbr, 24 tbn, 24 tbc
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from 'https://audio-ssl.itunes.apple.com/apple-assets-us-std-000001/Music/66/58/f7/mzi.eoocfriy.aac.p.m4a':
Metadata:
major_brand : M4A
minor_version : 0
compatible_brands: M4A mp42isom
creation_time : 1983-06-16T23:20:44.000000Z
iTunSMPB : 00000000 00000840 00000000 00000000001423C0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Duration: 00:00:29.98, start: 0.047891, bitrate: 285 kb/s
Stream #1:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 271 kb/s (default)
Metadata:
creation_time : 1983-06-16T23:20:44.000000Z
Input #2, mov,mp4,m4a,3gp,3g2,mj2, from 'item4.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
creation_time : 1970-01-01T00:00:00.000000Z
encoder : Lavf53.24.2
Duration: 00:00:13.70, start: 0.000000, bitrate: 615 kb/s
Stream #2:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 320x240 [SAR 1:1 DAR 4:3], 229 kb/s, 15 fps, 15 tbr, 15360 tbn, 30 tbc (default)
Metadata:
creation_time : 1970-01-01T00:00:00.000000Z
handler_name : VideoHandler
Stream #2:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, 5.1, fltp, 382 kb/s (default)
Metadata:
creation_time : 1970-01-01T00:00:00.000000Z
handler_name : SoundHandler
Input #3, lavfi, from 'anullsrc':
Duration: N/A, start: 0.000000, bitrate: 705 kb/s
Stream #3:0: Audio: pcm_u8, 44100 Hz, stereo, u8, 705 kb/s
[AVFilterGraph # 0x3955e20] No such filter: ' '
Error initializing complex filters.
Invalid argument
When concatting paired streams, for each segment, the concat filter expects a corresponding pair of inputs. So, if you are concatting 1 video and 2 audio streams, each segment input should be [v][a][a].
So, in this case, a dummy audio is required to pair with the 2nd video.
ffmpeg \
-loop 1 -framerate 24 -t 10 -i item1.jpg \
-t 10 -i "https://audio-ssl.itunes.apple.com/apple-assets-us-std-000001/Music/66/58/f7/mzi.eoocfriy.aac.p.m4a" \
-i item4.mp4 \
-f lavfi -t 1 -i anullsrc \
-filter_complex \
"[0]scale=432:432,setsar=1[img1]; \
[2]scale=432:432,setsar=1[vid1]; \
[img1][1][vid1][3] concat=n=2:v=1:a=1" \
outputfile.mp4
The anullsrc provides the dummy audio.
The intro audio has to be limited to the image duration, since the concat filter uses the duration of the longer stream in each segment.
Use setsar not setdar since SAR is the actual parameter that is changed and it's possible that after reduction to a rational number, the SARs may not match.
n in concat should be 2 since it specifies the number of paired segments, not total number of inputs.

Merging multichannel audio tracks from Mumble with ffmpeg

We record talks through Mumble and because Mumble has a nitfy multichannel feature I'd figured we could get subtitles from YouTube by uploading each track to YouTube separately with for file in *; do ffmpeg -loop 1 -r 2 -i "$img" -i "$file" -vf scale=-1:380 -c:v libx264 -preset slow -tune stillimage -crf 18 -c:a copy -shortest -pix_fmt yuv420p -threads 0 "$file".mkv; done I then can prepend with a eg. a sed shell script a nickname for each speaker in the automatic captions i.e. subtitles from YouTube. Works like a charm.
But merging those tracks with ffmpeg gets tricky. I use ffmpeg -i input1.ogg -input2.ogg -i input3.ogg -i input4.ogg -input5.ogg -filter_complex "[0:a][1:a][2:a][3:a][4:a] amerge=inputs=5[aout]" -map "[aout]" -ac 2 output.ogg
Somehow ffmpeg shortens the resulting audio track and I don't yet have an idea why. I tried using the longest first and last since including silent tracks made even a shorter mixdown. Here are the warnings:
[Parsed_amerge_0 # 0x7f8b29f02d20] No channel layout for input 1
[Parsed_amerge_0 # 0x7f8b29f02d20] Input channel layouts overlap: output layout will be determined by the number of distinct input channels
But it says
[Parsed_amerge_0 # 0x7f8b29f02d20] No channel layout for input 1
even when I change the order of inputs.
Allthough according to Mumble's documentation the tracks should be equal length VLC media info shows different track times. However the tracks are not out of sync just cut off at the end.
I also have no idea why ffmpeg mentions FLAC, all the files are vorbis.
ffmpeg -i Mumble-2017-09-09-16-33-18-149.210.187.155-chrisaiki2.ogg -i Mumble-2017-09-09-16-33-18-149.210.187.155-Recorder.ogg -i Mumble-2017-09-09-16-33-18-149.210.187.155-steempowerpics.ogg -i Mumble-2017-09-09-16-33-18-149.210.187.155-Taconator.ogg -i Mumble-2017-09-09-16-33-18-149.210.187.155-fuzzynewest.ogg -filter_complex "[0:a][1:a][2:a][3:a][4:a] amerge=inputs=5[aout]" -map "[aout]" -ac 2 output5.ogg
ffmpeg version 2.8.4 Copyright (c) 2000-2015 the FFmpeg developers
built with Apple LLVM version 7.0.2 (clang-700.1.81)
configuration: --prefix=/usr/local/Cellar/ffmpeg/2.8.4 --enable-shared -- enable-pthreads --enable-gpl --enable-version3 --enable-hardcoded-tables --enable- avresample --cc=clang --host-cflags= --host-ldflags= --enable-opencl --enable- libx264 --enable-libmp3lame --enable-libvo-aacenc --enable-libxvid --enable-vda
libavutil 54. 31.100 / 54. 31.100
libavcodec 56. 60.100 / 56. 60.100
libavformat 56. 40.101 / 56. 40.101
libavdevice 56. 4.100 / 56. 4.100
libavfilter 5. 40.101 / 5. 40.101
libavresample 2. 1. 0 / 2. 1. 0
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 2.101 / 1. 2.101
libpostproc 53. 3.100 / 53. 3.100
Input #0, ogg, from 'Mumble-2017-09-09-16-33-18-149.210.187.155- chrisaiki2.ogg':
Duration: 00:40:01.19, start: 0.000000, bitrate: 17 kb/s
Stream #0:0: Audio: vorbis, 48000 Hz, mono, fltp, 86 kb/s
Metadata:
ENCODER : libsndfile
TITLE : chrisaiki2
Input #1, ogg, from 'Mumble-2017-09-09-16-33-18-149.210.187.155-Recorder.ogg':
Duration: 00:33:57.88, start: 0.000000, bitrate: 1 kb/s
Stream #1:0: Audio: vorbis, 48000 Hz, mono, fltp, 86 kb/s
Metadata:
ENCODER : libsndfile
TITLE : Recorder
Input #2, ogg, from 'Mumble-2017-09-09-16-33-18-149.210.187.155-steempowerpics.ogg':
Duration: 00:33:53.93, start: 0.000000, bitrate: 1 kb/s
Stream #2:0: Audio: vorbis, 48000 Hz, mono, fltp, 86 kb/s
Metadata:
ENCODER : libsndfile
TITLE : steempowerpics
Input #3, ogg, from 'Mumble-2017-09-09-16-33-18-149.210.187.155-Taconator.ogg':
Duration: 00:35:36.37, start: 0.000000, bitrate: 6 kb/s
Stream #3:0: Audio: vorbis, 48000 Hz, mono, fltp, 86 kb/s
Metadata:
ENCODER : libsndfile
TITLE : Taconator
Input #4, ogg, from 'Mumble-2017-09-09-16-33-18-149.210.187.155-fuzzynewest.ogg':
Duration: 00:41:53.23, start: 0.000000, bitrate: 30 kb/s
Stream #4:0: Audio: vorbis, 48000 Hz, mono, fltp, 86 kb/s
Metadata:
ENCODER : libsndfile
TITLE : fuzzynewest
File 'output5.ogg' already exists. Overwrite ? [y/N] y
[Parsed_amerge_0 # 0x7f8b29f02d20] No channel layout for input 1
[Parsed_amerge_0 # 0x7f8b29f02d20] Input channel layouts overlap: output layout will be determined by the number of distinct input channels
[flac # 0x7f8b2b005600] encoding as 24 bits-per-sample
Output #0, ogg, to 'output5.ogg':
Metadata:
encoder : Lavf56.40.101
Stream #0:0: Audio: flac, 48000 Hz, stereo, s32 (24 bit), 128 kb/s (default)
Metadata:
encoder : Lavc56.60.100 flac
Stream mapping:
Stream #0:0 (vorbis) -> amerge:in0
Stream #1:0 (vorbis) -> amerge:in1
Stream #2:0 (vorbis) -> amerge:in2
Stream #3:0 (vorbis) -> amerge:in3
Stream #4:0 (vorbis) -> amerge:in4
amerge -> Stream #0:0 (flac)
Press [q] to stop, [?] for help
size= 100900kB time=00:33:53.94 bitrate= 406.4kbits/s
video:0kB audio:100441kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.457024%
Mumble multichannel talk on reddit
The amerge documentation states:
If inputs do not have the same duration, the output will stop with the
shortest.
amix may be a better filter for this case.
I used amix in the end like this:
ffmpeg -i input1.ogg -i input2.ogg -i input3.ogg -i inout4.ogg -i input5.ogg -filter_complex "[0:a][1:a][2:a][3:a][4:a] amix=inputs=5:duration=longest[aout]" -map "[aout]" -ac 2 -c:a libvorbis -b:a 128k output.ogg
ffmpeg didn't recognize libvorbis so I had to reinstall with brew first: brew reinstall ffmpeg --with-libvorbis
I then used ffmpeg -loop 1 -r 2 -i "$img" -i "$snd" -vf scale=-1:380 -c:v libx264 -preset slow -tune stillimage -crf 18 -c:a copy -shortest -pix_fmt yuv420p -threads 0 output.mkv to upload the mixed audio tracks to YouTube.
I had merged the subtitles which were generated with YouTube as well and I just added those to the resulting video. Works like a charm.

Resources