Concatenate audio with image and video using ffmpeg - linux

I have 1 image, 1 audio file and 1 video. I would like to merge all of them to make a video which will
show the image and play audio file for the first 10s
play the video file
here is what I was trying to do so far.
ffmpeg \
-loop 1 -framerate 24 -t 10 -i item1.jpg \
-i "https://audio-ssl.itunes.apple.com/apple-assets-us-std-000001/Music/66/58/f7/mzi.eoocfriy.aac.p.m4a" \
-i item4.mp4 \
-filter_complex \
"[0]scale=432:432,setdar=1[img1]; \
[1]volume=1[aud1]; \
[2]scale=432:432,setdar=1[vid1]; \
[img1][aud1][vid1] concat=n=3:v=1:a=1" \
outputfile.mp4
I got the error:
[Parsed_setdar_4 # 0x3063780] Media type mismatch between the
'Parsed_setdar_4' filter output pad 0 (video) and the
'Parsed_concat_6' filter input pad 1 (audio) [AVFilterGraph #
0x30479a0] Cannot create the link setdar:0 -> concat:1 Error
initializing complex filters. Invalid argument
I tried to googled but still cannot figure out what I am doing wrong?
Updated:
I ran the following command:
ffmpeg \
-loop 1 -framerate 24 -t 10 -i item1.jpg \
-t 10 -i "https://audio-ssl.itunes.apple.com/apple-assets-us-std-000001/Music/66/58/f7/mzi.eoocfriy.aac.p.m4a" \
-i item4.mp4 \
-f lavfi -t 1 -i anullsrc \
-filter_complex \
"[0]scale=432:432,setsar=1[img1]; \
[2]scale=432:432,setsar=1[vid1]; \
[img1][1][vid1][3] concat=n=2:v=1:a=1" \
outputfile.mp4
and got the following error:
ffmpeg version 3.3.3 Copyright (c) 2000-2017 the FFmpeg developers
built with gcc 4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
configuration: --extra-libs=-ldl --prefix=/opt/ffmpeg --mandir=/usr/share/man --enable-avresample --disable-debug --enable-nonfree --enable-gpl --enable-version3 --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-decoder=amrnb --disable-decoder=amrwb --enable-libpulse --enable-libfreetype --enable-gnutls --disable-ffserver --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-libvorbis --enable-libtheora --enable-libmp3lame --enable-libopus --enable-libvpx --enable-libspeex --enable-libass --enable-avisynth --enable-libsoxr --enable-libxvid --enable-libvidstab --enable-libwavpack --enable-nvenc --enable-libzimg
libavutil 55. 58.100 / 55. 58.100
libavcodec 57. 89.100 / 57. 89.100
libavformat 57. 71.100 / 57. 71.100
libavdevice 57. 6.100 / 57. 6.100
libavfilter 6. 82.100 / 6. 82.100
libavresample 3. 5. 0 / 3. 5. 0
libswscale 4. 6.100 / 4. 6.100
libswresample 2. 7.100 / 2. 7.100
libpostproc 54. 5.100 / 54. 5.100
Input #0, image2, from 'item1.jpg':
Duration: 00:00:00.04, start: 0.000000, bitrate: 8365 kb/s
Stream #0:0: Video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 432x432 [SAR 1:1 DAR 1:1], 24 fps, 24 tbr, 24 tbn, 24 tbc
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from 'https://audio-ssl.itunes.apple.com/apple-assets-us-std-000001/Music/66/58/f7/mzi.eoocfriy.aac.p.m4a':
Metadata:
major_brand : M4A
minor_version : 0
compatible_brands: M4A mp42isom
creation_time : 1983-06-16T23:20:44.000000Z
iTunSMPB : 00000000 00000840 00000000 00000000001423C0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Duration: 00:00:29.98, start: 0.047891, bitrate: 285 kb/s
Stream #1:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 271 kb/s (default)
Metadata:
creation_time : 1983-06-16T23:20:44.000000Z
Input #2, mov,mp4,m4a,3gp,3g2,mj2, from 'item4.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
creation_time : 1970-01-01T00:00:00.000000Z
encoder : Lavf53.24.2
Duration: 00:00:13.70, start: 0.000000, bitrate: 615 kb/s
Stream #2:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 320x240 [SAR 1:1 DAR 4:3], 229 kb/s, 15 fps, 15 tbr, 15360 tbn, 30 tbc (default)
Metadata:
creation_time : 1970-01-01T00:00:00.000000Z
handler_name : VideoHandler
Stream #2:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, 5.1, fltp, 382 kb/s (default)
Metadata:
creation_time : 1970-01-01T00:00:00.000000Z
handler_name : SoundHandler
Input #3, lavfi, from 'anullsrc':
Duration: N/A, start: 0.000000, bitrate: 705 kb/s
Stream #3:0: Audio: pcm_u8, 44100 Hz, stereo, u8, 705 kb/s
[AVFilterGraph # 0x3955e20] No such filter: ' '
Error initializing complex filters.
Invalid argument

When concatting paired streams, for each segment, the concat filter expects a corresponding pair of inputs. So, if you are concatting 1 video and 2 audio streams, each segment input should be [v][a][a].
So, in this case, a dummy audio is required to pair with the 2nd video.
ffmpeg \
-loop 1 -framerate 24 -t 10 -i item1.jpg \
-t 10 -i "https://audio-ssl.itunes.apple.com/apple-assets-us-std-000001/Music/66/58/f7/mzi.eoocfriy.aac.p.m4a" \
-i item4.mp4 \
-f lavfi -t 1 -i anullsrc \
-filter_complex \
"[0]scale=432:432,setsar=1[img1]; \
[2]scale=432:432,setsar=1[vid1]; \
[img1][1][vid1][3] concat=n=2:v=1:a=1" \
outputfile.mp4
The anullsrc provides the dummy audio.
The intro audio has to be limited to the image duration, since the concat filter uses the duration of the longer stream in each segment.
Use setsar not setdar since SAR is the actual parameter that is changed and it's possible that after reduction to a rational number, the SARs may not match.
n in concat should be 2 since it specifies the number of paired segments, not total number of inputs.

Related

FFMPEG: Specifying Output Stream Type When Combing Multiple Filters

I currently have 3 separate ffmpeg commands that do the following:
Overlay a watermark on a video: ffmpeg -i samplegreen.webm -i foregrounds/myimage.png -r 30 -filter_complex "overlay=(W-w)/2:H-h" -af "adelay=700" output.mp4
Overlay the results of 1) onto a beach video: ffmpeg -i backgrounds/beachsunsetmp4.mp4 -i output.mp4 -filter_complex "[1:v]chromakey=0x005d0b:0.1485:0.03[ckout];[0:v][ckout]overlay[o]" -map [o] -map 1:a -shortest somefolder/sample_video.mp4
Merge the audio of the results of 2) with another audio file: ffmpeg -i somefolder/sample_video.mp4 -i backgrounds/beachsunsetmp4.mp3 -filter_complex '[0:a][1:a]amerge=inputs=2[a]' -map 0:v -map '[a]' -c:v copy -ac 2 -shortest anotherfolder/sample_video.mp4
Now, this all works as intended, however, I was looking into attempting to combine them all into a single command, combining all the filters, like so:
ffmpeg -i samplegreen.webm -i foregrounds/myimage.png -r 30 -i backgrounds/beachsunsetmp4.mp4 -i backgrounds/beachsunsetmp4.mp3 -filter_complex \
"[0]overlay=(W-w)/2:H-h[output_1]; \
[output_1]chromakey=0x005d0b:0.1485:0.03[ckout]; \
[2:v][ckout]overlay[output_2]; \
[output_2][3:a] amerge=inputs=2 [output_3]" \
-af "adelay=700" -map [output_3] shortest final.mp4
It fails with the following error (Media type mismatch between the 'Parsed_overlay_2' filter output pad 0 (video) and the 'Parsed_amerge_3' filter input pad 0 (audio)):
ffmpeg version 4.3.2 Copyright (c) 2000-2021 the FFmpeg developers
built with Apple clang version 11.0.0 (clang-1100.0.33.17)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3.2_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox
libavutil 56. 51.100 / 56. 51.100
libavcodec 58. 91.100 / 58. 91.100
libavformat 58. 45.100 / 58. 45.100
libavdevice 58. 10.100 / 58. 10.100
libavfilter 7. 85.100 / 7. 85.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 7.100 / 5. 7.100
libswresample 3. 7.100 / 3. 7.100
libpostproc 55. 7.100 / 55. 7.100
Input #0, matroska,webm, from 'samplegreen.webm':
Metadata:
encoder : Chrome
Duration: N/A, start: 0.000000, bitrate: N/A
Stream #0:0(eng): Video: vp8, yuv420p(progressive), 1280x720, SAR 1:1 DAR 16:9, 1k tbr, 1k tbn, 1k tbc (default)
Metadata:
alpha_mode : 1
Stream #0:1(eng): Audio: opus, 48000 Hz, mono, fltp (default)
Input #1, png_pipe, from 'foregrounds/myimage.png':
Duration: N/A, bitrate: N/A
Stream #1:0: Video: png, rgba(pc), 350x86, 25 tbr, 25 tbn, 25 tbc
Input #2, mov,mp4,m4a,3gp,3g2,mj2, from 'backgrounds/beachsunsetmp4.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: mp42mp41
creation_time : 2021-02-16T18:24:40.000000Z
Duration: 00:00:32.53, start: 0.000000, bitrate: 3032 kb/s
Stream #2:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 1280x720, 3027 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)
Metadata:
creation_time : 2021-02-16T18:24:40.000000Z
handler_name : ?Mainconcept Video Media Handler
encoder : AVC Coding
[mp3 # 0x7f86cf809000] Estimating duration from bitrate, this may be inaccurate
Input #3, mp3, from 'backgrounds/beachsunsetmp4.mp3':
Metadata:
date : 2021-02-18 06:49
id3v2_priv.XMP : <?xpacket begin="\xef\xbb\xbf" id="W5M0MpCehiHzreSzNTczkc9d"?>\x0a<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 6.0-c003 79.164527, 2020/10/15-17:48:32 ">\x0a <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">\x0a <rdf
Duration: 00:00:32.60, start: 0.000000, bitrate: 132 kb/s
Stream #3:0: Audio: mp3, 48000 Hz, stereo, fltp, 128 kb/s
[Parsed_overlay_2 # 0x7f86cd4039c0] Media type mismatch between the 'Parsed_overlay_2' filter output pad 0 (video) and the 'Parsed_amerge_3' filter input pad 0 (audio)
[AVFilterGraph # 0x7f86cd402a40] Cannot create the link overlay:0 -> amerge:0
Error initializing complex filters.
Invalid argument
As far as I can tell, the issue is that the filter, amerge, wants 2 audio streams. Normally, I could take the input stream argument (which is a video), and make it use the audio by doing something like [0:a][1:a]amerge=inputs=2[results]. However, since my input stream is the output of a preceding filter, that doesn't seem to work (i.e. [output_2:a]). It bombs out with:
[matroska,webm # 0x7fecca000000] Invalid stream specifier: output_2:a.
Last message repeated 1 times
Stream specifier 'output_2:a' in filtergraph description [0]overlay=(W-w)/2:H-h[output_1]; [output_1]chromakey=0x005d0b:0.1485:0.03[ckout]; [2:v][ckout]overlay[output_2]; [output_2:a][3:a] amerge=inputs=2 [output_3] matches no streams.
So all of that said... Is there a way to specify that I'd like to use the audio stream from the output of a preceding filter? Or any other ways to combine all of these filters into a single command?
Thanks.
Any help would be greatly appreciated!
Except for a few filters like concat, a filter will take either only video inputs or only audio.
Here's the combined command.
ffmpeg \
-i samplegreen.webm \
-i foregrounds/myimage.png \
-i backgrounds/beachsunsetmp4.mp4 \
-i backgrounds/beachsunsetmp4.mp3 \
-filter_complex \
"[0][1]overlay=(W-w)/2:H-h,chromakey=0x005d0b:0.1485:0.03[ckout]; \
[2][ckout]overlay=shortest=1[v]; \
[0]adelay=700:all=1[0a]; \
[0a][3]amerge=inputs=2[a]" \
-map '[v]' -map '[a]' \
-shortest -r 30 -ac 2 \
output.mp4

Add another audio over a file with mixed audio tracks in ffmpeg

I have a file that was formed by concatenating three different files: a.mp4, b.mp4 and c.mp4.
do ffmpeg -f concat -i "concat-file.txt" -map 0:v -map 0:a -c:v libx264 -crf 23 -fflags +genpts joined-file.mp4"
After that I run this command, mentioned here: How to add a new audio (not mixing) into a video using ffmpeg?
ffmpeg -i joined-file.mp4 -i audio.mp3 -filter_complex "[0:a][1:a]amerge=inputs=2[a]" -map 0:v -map "[a]" -c:v copy -ac 2 -shortest output.mp4
What is causing this issue?
Thanks. :)
UPDATE:
Here are the commands that I have been running:
ffmpeg -i "middle/b.mp4" -c:v copy -video_track_timescale 30k -c:a aac -ac 6 -ar 44100 -shortest "wrap/b.mp4"
ffmpeg -f concat -i "concat-file.txt" -map 0:v -map 0:a -c:v libx264 -crf 23 -fflags +genpts "joined/abc.mp4"
ffmpeg -i "joined/abc.mp4" -i audio.mp3 -filter_complex "[0:a:0][1:a:0]amerge=inputs=2[a]" -map 0:v -map "[a]" -c:v copy -ac 2 -shortest "final/abc-cmplt.mp4"
Here is the "concat-file.txt":
file 'bits/a.mp4'
file 'wrap/b.mp4'
file 'bits/c.mp4'
All the video files a.mp4, b.mp4 and c.mp4 have their original audio. After I run the commands above, the joint video abc-cmplt.mp4 has combined audio (audio.mp3 plus their own) for the first (a.mp4) and last parts (c.mp4). However, the middle part only has its own audio and the extra audio I am trying to add does not seem to merge with the audio of b.mp4 in the final joint file.
Output of ffmpeg -i bits/a.mp4 -i wrap/b.mp4 -i bits/c.mp4:
ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 9.2.1 (GCC) 20200122
configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'bits/a.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: mp42mp41
creation_time : 2021-03-10T08:50:04.000000Z
Duration: 00:00:01.05, start: 0.000000, bitrate: 1846 kb/s
Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1080x1920 [SAR 1:1 DAR 9:16], 1462 kb/s, 30 fps, 30 tbr, 30k tbn, 60 tbc (default)
Metadata:
creation_time : 2021-03-10T08:50:04.000000Z
handler_name : ?Mainconcept Video Media Handler
encoder : AVC Coding
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 317 kb/s (default)
Metadata:
creation_time : 2021-03-10T08:50:04.000000Z
handler_name : #Mainconcept MP4 Sound Media Handler
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from 'wrap/b.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.29.100
Duration: 00:00:27.93, start: 0.000000, bitrate: 234 kb/s
Stream #1:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1080x1920, 231 kb/s, 30 fps, 30 tbr, 30k tbn, 60 tbc (default)
Metadata:
handler_name : VideoHandler
Input #2, mov,mp4,m4a,3gp,3g2,mj2, from 'bits/c.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: mp42mp41
creation_time : 2021-03-10T08:42:52.000000Z
Duration: 00:00:01.05, start: 0.000000, bitrate: 1829 kb/s
Stream #2:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1080x1920 [SAR 1:1 DAR 9:16], 1320 kb/s, 30 fps, 30 tbr, 30k tbn, 60 tbc (default)
Metadata:
creation_time : 2021-03-10T08:42:52.000000Z
handler_name : ?Mainconcept Video Media Handler
encoder : AVC Coding
Stream #2:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 317 kb/s (default)
Metadata:
creation_time : 2021-03-10T08:42:52.000000Z
handler_name : #Mainconcept MP4 Sound Media Handler
All files to be concatenated by the concat demuxer must have these same attributes. b.mp4 has a different H.264 profile and lacks audio. Fix that:
ffmpeg -i "middle/b.mp4" -f lavfi -i anullsrc=cl=stereo:r=48000 -c:v libx264 -profile:v main -video_track_timescale 30k -shortest "wrap/b.mp4"
Then concatenate and mix the audio:
ffmpeg -f concat -i "concat-file.txt" -i audio.mp3 -c:v libx264 -crf 23 -filter_complex "[0:a:0][1:a:0]amerge=inputs=2" -ac 2 "joined/abc.mp4"
Option
Description
-f lavfi
Tell ffmpeg the following input is a filter instead of a file.
-i anullsrc=cl=stereo:r=48000
Use the anullsrc filter to generate silent stereo audio with 48000 sample rate.
-profile:v main
Set H.264 Profile to Main.
-video_track_timescale 30k
Set timescale to 30k to match the other videos (30k tbn).

FFMPEG command causing audio issues

I am converting multiple mp4 video to ts and then stitching it together.
But this sometimes causes audio issues on my videos where the audio sounds like it was recorded with two mics at the same time causing loud sound.
I can only reproduce it sometimes and I am still not sure why it's doing that? Can anyone help?
Here is how I am converting to ts from mp4. I have noticed that the longer the video gets, the audio gets worse and its also off by a couple of seconds.
ffmpeg -i video1.mp4 -f lavfi -i anullsrc=channel_layout=mono:sample_rate=48000 -shortest -c copy -bsf:v h264_mp4toannexb -c:a aac video1.ts
ffmpeg -i video2.mp4 -f lavfi -i anullsrc=channel_layout=mono:sample_rate=48000 -shortest -c copy -bsf:v h264_mp4toannexb -c:a aac video2.ts
ffmpeg -i video3.mp4 -f lavfi -i anullsrc=channel_layout=mono:sample_rate=48000 -shortest -c copy -bsf:v h264_mp4toannexb -c:a aac video3.ts
and then I save these paths to a txt and call my stitching command like this
ffmpeg -f concat -safe 0 -i list.txt -c copy -bsf:a aac_adtstoasc finalvideo.mp4
Here is the complete output of the 4 videos
C:\Users\Alan\Desktop\videos>ffmpeg -i video1.mp4 -i video2.mp4 -i video3.mp4 -i video4.mp4
ffmpeg version N-90433-g5b31dd1c6b Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 7.3.0 (GCC)
configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-bzlib --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth
libavutil 56. 12.100 / 56. 12.100
libavcodec 58. 15.100 / 58. 15.100
libavformat 58. 10.100 / 58. 10.100
libavdevice 58. 2.100 / 58. 2.100
libavfilter 7. 13.100 / 7. 13.100
libswscale 5. 0.102 / 5. 0.102
libswresample 3. 0.101 / 3. 0.101
libpostproc 55. 0.100 / 55. 0.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'video1.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.71.100
Duration: 00:00:10.80, start: 0.000000, bitrate: 1034 kb/s
Stream #0:0(eng): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 879 kb/s, 4.17 fps, 4.17 tbr, 12800 tbn, 8.33 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 165 kb/s (default)
Metadata:
handler_name : SoundHandler
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from 'video2.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.71.100
Duration: 00:00:01.62, start: 0.000000, bitrate: 3208 kb/s
Stream #1:0(eng): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 3203 kb/s, 16.67 fps, 16.67 tbr, 12800 tbn, 33.33 tbc (default)
Metadata:
handler_name : VideoHandler
Input #2, mov,mp4,m4a,3gp,3g2,mj2, from 'video3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.71.100
Duration: 00:00:05.58, start: 0.000000, bitrate: 1954 kb/s
Stream #2:0(eng): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1805 kb/s, 16.67 fps, 16.67 tbr, 12800 tbn, 33.33 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #2:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 166 kb/s (default)
Metadata:
handler_name : SoundHandler
Input #3, mov,mp4,m4a,3gp,3g2,mj2, from 'video4.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.71.100
Duration: 00:00:03.90, start: 0.000000, bitrate: 1746 kb/s
Stream #3:0(eng): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1744 kb/s, 16.67 fps, 16.67 tbr, 12800 tbn, 33.33 tbc (default)
Metadata:
handler_name : VideoHandler

MP4Box / FFMPEG concat loses audio after first clip

So I am certainly no expert when it comes to either of these tools, but I have a web-based project that's executing commands on an Amazon Linux server to concatenate two video files that are uploaded.
Both files are converted to mp4s first using FFMPEG, and those play perfectly in a browser after conversion:
ffmpeg -i file1.mpg -c:v libx264 -crf 22 -c:a aac -strict -2 -movflags faststart file2.mp4
Then, I attempt to combine these two resulting mp4s into a single mp4. I tried using FFMPEG to do this but to no avail. Switching to try MP4Box got me much closer: the videos are concatenated together, but the audio stops playing at the end of the first clip, and the second clip is silent.
MP4Box -force-cat -keepsys -add file.mp4 -cat file2.mp4 out.mp4
I've tried varying versions of the above command with no better results. Any input is greatly appreciated.
EDIT: info on .mp4 files using
ffmpeg -i file1.mp4 -i file2.mp4
ffmpeg -i 1510189259715DogRunsintoGlassDoor_315a03a8e20acfc.mp4 -i
1510189273549NewhouseMoonMoonneverseenstairsbeforefunnydog_285a03a8e6aab25.mp4
ffmpeg version N-61041-g52a2138 Copyright (c) 2000-2014 the FFmpeg
developers
built on Mar 2 2014 05:45:04 with gcc 4.6 (Debian 4.6.3-1)
configuration: --prefix=/root/ffmpeg-static/64bit
--extra-cflags='-I/root/ffmpeg-static/64bit/include -static' --extra-ldflags='-L/root/ffmpeg-static/64bit/lib -static' --extra-libs='-lxml2 -lexpat -lfreetype' --enable-static --disable-shared --disable-ffserver --disable-doc --enable-bzlib --enable-zlib --enable-postproc --enable-runtime-cpudetect --enable-libx264 --enable-gpl --enable-libtheora --enable-libvorbis --enable-libmp3lame --enable-gray --enable-libass --enable-libfreetype --enable-libopenjpeg --enable-libspeex --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-version3 --enable-libvpx
libavutil 52. 66.100 / 52. 66.100
libavcodec 55. 52.102 / 55. 52.102
libavformat 55. 33.100 / 55. 33.100
libavdevice 55. 10.100 / 55. 10.100
libavfilter 4. 2.100 / 4. 2.100
libswscale 2. 5.101 / 2. 5.101
libswresample 0. 18.100 / 0. 18.100
libpostproc 52. 3.100 / 52. 3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'1510189259715DogRunsintoGlassDoor_315a03a8e20acfc.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf55.33.100
Duration: 00:00:04.92, start: 0.023220, bitrate: 634 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
360x360 [SAR 1:1 DAR 1:1], 501 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc
(default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (mp4a / 0x6134706D), 44100 Hz, mono,
fltp, 132 kb/s (default)
Metadata:
handler_name : SoundHandler
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from
'1510189273549NewhouseMoonMoonneverseenstairsbeforefunnydog_285a03a8e6aab25.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf55.33.100
Duration: 00:00:18.79, start: 0.023220, bitrate: 455 kb/s
Stream #1:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
362x360 [SAR 1:1 DAR 181:180], 320 kb/s, 29.94 fps, 29.94 tbr, 11976
tbn, 59.88 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #1:1(eng): Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo,
fltp, 129 kb/s (default)
Metadata:
handler_name : SoundHandler
At least one output file must be specified

FFMPEG - Merge mp4 files and audio file

I have an ffmpeg that merges 3 mp4 videos and then another command that adds audio to the output file from the first command. The commands are as follows:
ffmpeg -i vid-1.mp4 -i vid-2.mp4 -i vid-3.mp4 -filter_complex "[0:v][1:v][2:v]concat=n=3:v=1" -preset ultrafast -crf 1 output.mp4
ffmpeg -i output.mp4 -i audio.mp3 -preset ultrafast -crf 1 final.mp4
vid-1.mp4 (does NOT have audio stream)
vid-2.mp4 (does NOT have audio stream)
Is there anyway to do this in one command? I would like to also add the audio to the video that is getting created in the first command. Is this possible?
Console output of "ffmpeg -i vid-1.mp4 -i vid-2.mp4 -i vid-3.mp4 -i audio.mp3"
[jstevens#jr testing]$ ffmpeg -i vid-1.mp4 -i vid-2.mp4 -i vid-3.mp4 -i audio.mp3
ffmpeg version 3.0.2 Copyright (c) 2000-2016 the FFmpeg developers
built with gcc 6.1.1 (GCC) 20160510 (Red Hat 6.1.1-2)
configuration: --prefix=/usr --bindir=/usr/bin --datadir=/usr/share/ffmpeg --incdir=/usr/include/ffmpeg --libdir=/usr/lib64 --mandir=/usr/share/man --arch=x86_64 --optflags='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic' --enable-bzlib --disable-crystalhd --enable-frei0r --enable-gnutls --enable-ladspa --enable-libass --enable-libcdio --enable-libdc1394 --disable-indev=jack --enable-libfreetype --enable-libgsm --enable-libmp3lame --enable-openal --enable-libopencv --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libvorbis --enable-libv4l2 --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-x11grab --enable-avfilter --enable-avresample --enable-postproc --enable-pthreads --disable-static --enable-shared --enable-gpl --disable-debug --disable-stripping --shlibdir=/usr/lib64 --enable-runtime-cpudetect
libavutil 55. 17.103 / 55. 17.103
libavcodec 57. 24.102 / 57. 24.102
libavformat 57. 25.100 / 57. 25.100
libavdevice 57. 0.101 / 57. 0.101
libavfilter 6. 31.100 / 6. 31.100
libavresample 3. 0. 0 / 3. 0. 0
libswscale 4. 0.100 / 4. 0.100
libswresample 2. 0.101 / 2. 0.101
libpostproc 54. 0.100 / 54. 0.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'vid-1.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf56.25.101
Duration: 00:00:05.00, start: 0.000000, bitrate: 1085 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1081 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
Metadata:
handler_name : VideoHandler
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from 'vid-2.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf56.25.101
Duration: 00:00:05.00, start: 0.000000, bitrate: 1018 kb/s
Stream #1:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1014 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
Metadata:
handler_name : VideoHandler
Input #2, mov,mp4,m4a,3gp,3g2,mj2, from 'vid-3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf56.25.101
Duration: 00:00:05.00, start: 0.000000, bitrate: 823 kb/s
Stream #2:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 819 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
Metadata:
handler_name : VideoHandler
[mp3 # 0x1ca30c0] Skipping 0 bytes of junk at 0.
[mp3 # 0x1ca30c0] Estimating duration from bitrate, this may be inaccurate
Input #3, mp3, from 'audio.mp3':
Duration: 00:00:19.57, start: 0.000000, bitrate: 64 kb/s
Stream #3:0: Audio: mp3, 44100 Hz, mono, s16p, 64 kb/s
At least one output file must be specified
ffmpeg -i vid-1.mp4 -i vid-2.mp4 -i vid-3.mp4 -i audio.mp3 \
-filter_complex "[0:v][1:v][2:v]concat=n=3:v=1:a=0[v]" \
-map "[v]" -map 3:a -shortest output.mp4
I recommend to manually define mappings with -map instead of relying on the default stream selection behavior.
The -shortest option is added because the concatenated video duration is shorter than the audio duration.

Resources