Ffmpeg: alternating audio languages in resulting movie for language learning - audio

Need to convert multiple-(audio)-language-video to single-audio-stream-video where 2 languages alternate repeatedly.
(10sec Lang2) + (15sec Lang3) + (10sec Lang2) + (15sec Lang3) + ... and so on till the end.
I assume it should be done with piping in and changing audio streams. (I've read ffmpeg piping documentation but didn't quite understand it).
I've done audio switching task (with scripting on Windows) by lively changing audio languages in video player but need better and crossplatform solution for little kid - preprepared video.
If possible, to adjust loudness of one of the input audio streams to the other.
P.S. Think it would be useful for many married (programmers) to show little kids bilingual cartoons. (To prepare for language learning). By balancing 10/15 sec you may retain kid's attention — the older they grow the more native language they demand.
Irrelevant, just for what's my experience:
%ffmpeg% -y -f concat -safe 0 -i %playlist% -i %picture% -map:v 0 -map:v 1 -c:v copy -disposition:v:0 attached_pic -ac 1 -af aresample=resampler=soxr -ar 16000 -%title% -%album% -%artist% %lyrics% -c:a aac -q:a 1 %output%

Related

Mixing various audio and video sources into a single video

I've already read FFmpeg - Overlay one video onto another video?, How to overlay 2 videos at different time over another video in single ffmpeg command?, FFmpeg - Multiple videos with 4 areas and different play times (and many similar questions tagged [ffmpeg] about setpts), and the following code is working, but I'm sure we can simplify it, and have a more elegant solution.
I'd like to mix multiple sources (image and sound) , with different starting points:
t (seconds) 0 1 2 3 4 5 6 7 8 9 10 11 12 13
test.png [-------------------------------]
a.mp3 [-------]
without_sound.mp4 [-------------------] (overlay at x,y=200,200)
b.mp3 [---]
with_sound.mp4 [---------------------------------------] (overlay at x,y=100,100)
This works:
ffmpeg -i test.png
-t 2 -i a.mp3
-t 5 -i without_sound.mp4
-t 1 -i b.mp3
-t 10 -i with_sound.mp4
-filter_complex "
[0]setpts=PTS-STARTPTS[s0];
[1]adelay=2000^|2000[s1];
[2]setpts=PTS-STARTPTS+7/TB[s2];
[3]adelay=5000^|5000[s3];
[4]setpts=PTS-STARTPTS+3/TB[s4];
[4:a]adelay=3000^|3000[t4];
[s1][s3][t4]amix=inputs=3[outa];
[s0][s4]overlay=100:100[o2];
[o2][s2]overlay=200:200[outv]
" -map [outa] -map [outv]
out.mp4 -y
but:
is it normal that we have to use both setpts and adelay? I have tried without adelay and then the sound is not shifted. Said differently, is there a way to simplify:
[4]setpts=PTS-STARTPTS+3/TB[s4];
[4:a]adelay=3000^|3000[t4];
?
is there a way to do it with setpts and asetpts only? When I replaced adelay=5000|5000 with asetpts=PTS-STARTPTS+5/TB and also for the other one, it didn't give the expected time-shifting (see below)
in similar questions/answers I often see overlay=...:enable='between(t,...,...)', here it seems it is not needed, why?
More generally, how would you simplify this "mix multiple audio and video" ffmpeg code?
More details about the second bullet point: if we replace adelay by asetpts,
-filter_complex "
[0]setpts=PTS-STARTPTS[s0];
[1]asetpts=PTS-STARTPTS+2/TB[s1];
[2]setpts=PTS-STARTPTS+7/TB[s2];
[3]asetpts=PTS-STARTPTS+5/TB[s3];
[4]setpts=PTS-STARTPTS+3/TB[s4];
[4:a]asetpts=PTS-STARTPTS+3/TB[t4];
[s1][s3][t4]amix=inputs=3[outa];
[s0][s4]overlay=100:100[o2];
[o2][s2]overlay=200:200[outv]
it doesn't work: [3] should begin at 0'05", and [4:a] at 0'03" but they all begin at the same time than [1], i.e. at 0'02".
It seems that amix only takes the first asetpts in consideration, and discards the others; is it true?
is it normal that we have to use both setpts and adelay?
Yes, the former is for video streams; the latter, for audio. asetpts is not suitable for use with amix since the latter ignores starting time offsets. adelay fills in with silence from 0 to the desired offset.
I often see overlay=...:enable='between(t,...,...)', here it seems it is not needed, why?
Overlay syncs its main and overlay video frames by timestamps. enable is needed if one wishes to disable overlay when synced frames are available for both inputs.

FFMPEG command to mix audio and video with adjustable volume

I have:
Video file of X length
Audio of Y length
I am trying to achieve an output video that has the following qualities:
The volume level of the added audio should be adjustable
The audio should loop till the end of the video
It should not break even if the input video does not have any audio
I should be able to mute the audio of the source video if needed.
All of the above, in the fastest possible way.
I'm not well versed with FFMPEG, maybe some experts could help.
since you are using a library i assume that you know how to run pure FFmpeg commands
based on your third condition we will divide the solution to two part :
It should not break even if the input video does not have any audio
in order to cover this condition, you can check if there is any audio stream in your video file before running any FFmpeg command with below code:
private boolean isVideoContainAudioStream(String videoPath) {
MediaMetadataRetriever retriever = new MediaMetadataRetriever();
retriever.setDataSource(videoPath);
String hasAudioStream = retriever.extractMetadata(MediaMetadataRetriever.METADATA_KEY_HAS_AUDIO);
if (hasAudioStream != null && hasAudioStream.equals("yes"))
return true;
else
return false;
}
1. Part One :
so if the result of above function is equal to true, your video file contain audio stream so you can run below command :
ffmpeg -i video.mp4 -filter_complex "amovie=/path/to/audio/file/audio.mp3:loop=0,asetpts=N/SR/TB,volume=2.0[audio];[0:a]volume=0.5[sa];[sa][audio]amix[fa]" -map 0:v -map [fa] -vcodec libx264 -preset ultrafast -shortest fout.mp4
in above command we take audio file at a specific path with amovie filter
loop=0, Loop audio infinitely
asetpts=N/SR/TB, Generate timestamps by counting samples
volume=2.0, multiply audio volume by 2.0
video's audio stream is accessible with [0:a] filter pad so we take it and set the volume to half of the input's volume and name it [sa] obviously if you want to mute the audio of the source video you change that part to :
[0:a]volume=0.0[sa]
after that we will mix two audio streams using amix filter and name it [fa], so far we have everything we wanted, and we just want to merge audio and video streams
-vcodec libx264, we are using x264 video encoding because it has lots of configs to gain better performance and speed
-shortest, since we loop audio infinitely, we tell the ffmpeg to continue creating frames until the shortest stream ends (video stream is the short one for sure)
-preset ultrafast, preset is one of the x264 options, ultrafast will give you more encoding speed at the cost of more size in output file, usually using veryfast value for this flag is a good combination of speed and size
2. Part Two :
if the isVideoContainAudioStream function return false (which means your input video is muted) you can run below command:
ffmpeg -i mute_video.mp4 -filter_complex "amovie=/path/to/audio/file/audio.mp3:loop=0,asetpts=N/SR/TB,volume=2.0[audio]" -map 0:v -map [audio] -vcodec libx264 -preset ultrafast -crf 18 -shortest m_fout.mp4
in above command we use another x264 options called CRF
Constant Rate Factor (CRF)
Use this rate control mode if you want to keep the best quality and care less about the file size. This is the recommended rate control mode for most uses.
The range of the CRF scale is 0–51, where 0 is lossless, 23 is the default, and 51 is worst quality possible. A lower value generally leads to higher quality, and a subjectively sane range is 17–28. Consider 17 or 18 to be visually lossless or nearly so; it should look the same or nearly the same as the input but it isn't technically lossless.
The range is exponential, so increasing the CRF value +6 results in roughly half the bitrate / file size, while -6 leads to roughly twice the bitrate.
Choose the highest CRF value that still provides an acceptable quality. If the output looks good, then try a higher value. If it looks bad, choose a lower value.
thats it, there is lots of option for x264 encoder, you can check all available options at this link:
H.264 Video Encoding Guide

Can FFMPEG pause a live audio recording when db levels fall below threshold?

I am in the process of converting 100's of audio cassettes to FLAC files. I have used Audacity and RecordPad by NCH Software to set decibel (dB) thresholds. so that when my devices stop and require a tape flip, the recording also pauses.
I would like to move my tape playing devices (USB-based) to ubuntu 18.04 and run them on FFMPEG, but was wondering if FFMPEG had the ability to pause the recording when the audio signal falls below a configurable threshold.
This helps me reduce the storage waste. Soon I have to convert some older audio reels as well in which the problem will get worse.
The probe with Audacity is limited # of instances. If Audacity allowed me to instantiate multiple instances (1 for each playback device), I wouldn't be asking this question.
I'm a bit of a noob to shell scripting...here is the end of my current shell script:
nohup ffmpeg -nostdin -f alsa -i hw:"$DEVICE" -t $DURATION -filter:a volumedetect -ar 48000 -ac 2 -b:a 320k "$TITLE".flac 2> "$TITLE".log &
Thanks in advance!
Deep

Using FFmpeg or Similar to Normalize audio in a video to EBU R128 standard

This is my first time here on stack overflow asking question.
I am stuck and really struggling with this. I am trying to make some of my MXF video files to be EBU r128 standard for its audio.
This means that it has to be -23 and not higher than 0.5.
My current process
Watch_folder > Encoding to MXF > Output_folder
I need to makesure when its comes to output folder, those MXF files are EBU R128 Loudness compliant.
What I have done so Far:
FFMPEG:
ffmpeg -i input.mxf -af loudnorm=I=-23:LRA=7:tp=-2:print_format=json -f null -
got the result:
Input Integrated: -15.1 LUFS
Input True Peak: +0.0 dBTP
Input LRA: 17.1 LU
Input Threshold: -26.2 LUFS
Output Integrated: -17.1 LUFS
Output True Peak: -1.5 dBTP
Output LRA: 5.3 LU
Output Threshold: -27.6 LUFS
Normalization Type: Dynamic
Target Offset: +1.1 LU
then i did
ffmpeg -i input.mxf -af loudnorm=I=-23:LRA=7:tp=-2:measured_I=-15.1:measured_LRA=17.1:measured_tp=0:measured_thresh=-27.6:offset=1.1 -ar 48k -y output.mxf
However, when i put it through the software Eff, it says that its not EBU compliant.
*EDIT:
This also reduces the quality. for example; my 6 Gb becomes 250 MB and you can tell the quality downgraded
ffmpeg-normalize
I did the following
ffmpeg-normalize input.mxf -c:a pcm_s32le -ar 48000 -o output.mxf
but this gives me errors.
if i do it without the output file type, i get a mkv which will not work for me. i need it to be mxf.
OK, a few issues here.
Firstly, if your file is measured at -26.2 LUFS, you'd need to add 3.2 dB to get it to -23. But you can't do that, because your true peak is too high (you'd be over full scale). You'll need to compress (dynamic audio compression, not file/rate compression) the audio or use at least a limiter to achieve this.
A good R128 audio track should be mixed properly rather than just run through a normaliser, otherwise you risk it either failing the standard or unwanted audio effects.
If you don't have access to audio editing software or someone who can do this for you, then FFMPEG does include an audio limiter, which will give you enough headroom to raise the level to -23 LUFS.
You can do that with something like this:
-filter_complex alimiter=level_in=1:level_out=1:limit=1.5:attack=7:release=100:level=disabled
However, tuning a limiter well depends on what the video file is of (music, speech, etc) and it is something that's worth taking some time over. Alter the attack and release values until you get the result you want.
Secondly, the reason that FFMPEG has produced a smaller file of lower quality is because you didn't specify anything in the video section. FFMPEG's default action with video is (usually) to encode to h264, so whatever your codec here is (I am assuming DNxHD from the fact that you're using an MXF wrapper) needs to be specified. FFMPEG will copy the video stream though and leave it alone if you include the option -c:v copy (which means copy video codec, basically).
Post your results once you have tried these...!

ffmpeg: How to assign an empty soundtrack to a video?

I'm using ffmpeg to build a short hunk of video from a machine-generated png. This is working, but the video now needs to have a soundtrack (an [audio] field) for some of the other things I'm doing with it. I don't actually want any sound in the video, so: is there a way to get ffmpeg to simply set up an empty soundtrack property in the video, perhaps as part of the call that creates the video? I guess I could make an n-second long silent mp3 and bash it in, but is there a simpler / more direct way? Thanks!
Thanks to #Alvaro for the links; one of these worked after a bit of massaging. It does seem to be a two-step process: First make the soundtrack-less video and then do:
ffmpeg -ar 44100 -acodec pcm_s16le -f s16le -ac 2 -channel_layout 2.1
-i /dev/zero -i in.mp4 -vcodec copy -acodec libfaac -shortest out.mp4
The silence comes from /dev/zero and -shortest makes the process stop at the end of the video. Argument order is significant here; -shortest needs to be down near the output file spec.
This assumes that your ffmpeg installation has libfaac installed, which it might not. But, otherwise, this seems to be working.
I guess you need to create a media file properly with audio and video stream. As far as i know, there is not a direct way.
If you know your video duration, first create the dummy audio and after when you create the video try to join the audio part.
In superuser, you can find more info link1 link2

Resources