Ducking music using FFmpeg - audio

I want to duck a music audio with a speech audio using FFmpeg (ducking is commonly used to lower background music anytime a person speaks, then raises it when that person finishes speaking). How can I do it with FFmpeg or any other tools (if any)?
There was an FFmpeg filter called sidechaincompress which merges the two, but it doesn't "duck" them. It takes 2 audio inputs, 1st input to be compressed depending on the signal of 2nd input and later compressed signal to be merged with 2nd input:
ffmpeg -i main.flac -i sidechain.flac -filter_complex "[1:a]asplit=2[sc][mix];[0:a][sc]sidechaincompress[compr];[compr][mix]amerge"

Related

How do I mix multiple audio tracks mit FFMPEG and adjust each volume?

Let's say I have an input .mp4 file that contains 4 audio tracks.
How can I change their volumes independently and convert it to a new file that just contains all the 4 audio tracks mixed together and stored in the first audio track? For example I want the first, second and third audio tracks from the input file to be double their original volume and the fourth to be half its original volume, all saved in the output files first audio track. How would that command look like?
Here you can find many good answers: How to overlay/downmix two audio files using ffmpeg
where the most comprehensive one links to https://trac.ffmpeg.org/wiki/AudioChannelManipulation
I recently had a similar use case: freely mixing 6 mono tracks of a multi-track recording to stereo output with different volumes on either or both output channels, which can be achieved like this:
ffmpeg -i 0.flac -i 1.flac -i 2.flac -i 3.flac -i 4.flac -i 5.flac \
-filter_complex [0:a][1:a][2:a][3:a][4:a][5:a]amerge=inputs=6,pan=stereo|c0=c0+1.2*c1+1.2*c2+1.3*c3+c4|c1=c0+1.3*c3+c4+0.8*c5[a] \
-map [a] output.flac

Can ffmpeg transcode an audio track and add it as a second audio track at the same time, or if not, how to do it as separate commands?

A bit of history. I am using Plex as my media server, but for reasons unknown, it has issues transcoding the DTS-HD MA 7.1 audio to EAC3 stereo and keeps buffering (the server has plenty of horsepower on all fronts, CPU/RAM/drive space & speed, gigabit networks connections for all devices. The playback device (TCL Roku TV, with a 3rd party soundbar connected via HDMI ARC) doesn't support the built-in 7.1 audio, so I get silence if I play it back directly by putting the file on a USB stick.
Also, I am by no means a ffmpeg guru, I figured out what I do know by Google University and asking questions, so please be kind and forgive me if I ask follow-up questions that may seem n00b-ish, and please provide example commands (preferably in the context of my command below so that I can have a known point of reference to start with).
I have a movie with 4K (HEVC Main 10 HDR) video and DTS-HD MA 7.1 audio that I am looking to leave the video and audio untouched, but to add a 2nd audio track in either EAC3 or if necessary, just AC3 in stereo
So what I am looking for is as follows:
video.mkv
Existing->4k video file (no change)
Existing->7.1 audio (no change)
Convert and add->stereo audio as a 2nd audio track to the output.mkv file
Below is the command I've historically used with ffmpeg to convert and replace the audio file with the stereo audio, but since I'd prefer to leave the 7.1 audio in place, this doesn't work:
ffmpeg -i "D:\video.mkv" -c:v copy -c:a aac -b:a 128k "D:\output.mkv"
And if this cannot be done as a single command, please also let me know what steps I do need to take to be able to do it.
Thanks in advace,
Mike
ffmpeg -i input.mkv -map 0 -map 0:a -c copy -c:a:1 eac3 output.mkv
-map 0 select all streams.
-map 0:a select all audio streams. This combines with -map 0 so now you have 1 video and 2 audio streams selected.
-c copy stream copy all streams.
-c:a:1 eac3 encode output audio stream #1 with eac3 encoder. This overrides -c copy for this particular stream.

FFMPEG command to mix audio and video with adjustable volume

I have:
Video file of X length
Audio of Y length
I am trying to achieve an output video that has the following qualities:
The volume level of the added audio should be adjustable
The audio should loop till the end of the video
It should not break even if the input video does not have any audio
I should be able to mute the audio of the source video if needed.
All of the above, in the fastest possible way.
I'm not well versed with FFMPEG, maybe some experts could help.
since you are using a library i assume that you know how to run pure FFmpeg commands
based on your third condition we will divide the solution to two part :
It should not break even if the input video does not have any audio
in order to cover this condition, you can check if there is any audio stream in your video file before running any FFmpeg command with below code:
private boolean isVideoContainAudioStream(String videoPath) {
MediaMetadataRetriever retriever = new MediaMetadataRetriever();
retriever.setDataSource(videoPath);
String hasAudioStream = retriever.extractMetadata(MediaMetadataRetriever.METADATA_KEY_HAS_AUDIO);
if (hasAudioStream != null && hasAudioStream.equals("yes"))
return true;
else
return false;
}
1. Part One :
so if the result of above function is equal to true, your video file contain audio stream so you can run below command :
ffmpeg -i video.mp4 -filter_complex "amovie=/path/to/audio/file/audio.mp3:loop=0,asetpts=N/SR/TB,volume=2.0[audio];[0:a]volume=0.5[sa];[sa][audio]amix[fa]" -map 0:v -map [fa] -vcodec libx264 -preset ultrafast -shortest fout.mp4
in above command we take audio file at a specific path with amovie filter
loop=0, Loop audio infinitely
asetpts=N/SR/TB, Generate timestamps by counting samples
volume=2.0, multiply audio volume by 2.0
video's audio stream is accessible with [0:a] filter pad so we take it and set the volume to half of the input's volume and name it [sa] obviously if you want to mute the audio of the source video you change that part to :
[0:a]volume=0.0[sa]
after that we will mix two audio streams using amix filter and name it [fa], so far we have everything we wanted, and we just want to merge audio and video streams
-vcodec libx264, we are using x264 video encoding because it has lots of configs to gain better performance and speed
-shortest, since we loop audio infinitely, we tell the ffmpeg to continue creating frames until the shortest stream ends (video stream is the short one for sure)
-preset ultrafast, preset is one of the x264 options, ultrafast will give you more encoding speed at the cost of more size in output file, usually using veryfast value for this flag is a good combination of speed and size
2. Part Two :
if the isVideoContainAudioStream function return false (which means your input video is muted) you can run below command:
ffmpeg -i mute_video.mp4 -filter_complex "amovie=/path/to/audio/file/audio.mp3:loop=0,asetpts=N/SR/TB,volume=2.0[audio]" -map 0:v -map [audio] -vcodec libx264 -preset ultrafast -crf 18 -shortest m_fout.mp4
in above command we use another x264 options called CRF
Constant Rate Factor (CRF)
Use this rate control mode if you want to keep the best quality and care less about the file size. This is the recommended rate control mode for most uses.
The range of the CRF scale is 0–51, where 0 is lossless, 23 is the default, and 51 is worst quality possible. A lower value generally leads to higher quality, and a subjectively sane range is 17–28. Consider 17 or 18 to be visually lossless or nearly so; it should look the same or nearly the same as the input but it isn't technically lossless.
The range is exponential, so increasing the CRF value +6 results in roughly half the bitrate / file size, while -6 leads to roughly twice the bitrate.
Choose the highest CRF value that still provides an acceptable quality. If the output looks good, then try a higher value. If it looks bad, choose a lower value.
thats it, there is lots of option for x264 encoder, you can check all available options at this link:
H.264 Video Encoding Guide

FFMPEG: Properly sidechain_compress stereo background with stereo sidechain into stereo output

I'm doing voiceover and since Sony Vegas does not support sidechaining, I render voiceover into voices.wav and then use sidechain_compress filter, as per ffmpeg documentation:
ffmpeg -y -i background.m4a -i voices.wav -filter_complex \
"[1:a]asplit=2[sc][mix];\
[0:a][sc]sidechaincompress=threshold=0.015:ratio=2:level_sc=0.8:release=500:attack=1[compr];\
[compr][mix]amerge" sidechain_1.wav
voices.wav is a stereo audio file, as well as background.m4a. But here's how the result file looks like when loaded into Sony Vegas:
This shows that in channels 1/2 I get the compressed background, while in channel 3 and 4 I get two mono tracks that somehow differ (probably, that's the original voices input and somewhat altered voices input, both in mono). UPD: I don't want to further process resulting tracks in Sony Vegas, I'd prefer ffmpeg to be the last step in my production process. The screenshot above is for illustration purposes only.
Is the background gets sidechain compressed with only left or right channel of voices? If so, how to change that to make it compressed by both channels (some voices are panned into left or right, so there might be actual difference in compressed result)
What are those channels 3 and 4? Why are they mono?
How do I get single 1/2 stereo track in the output wav file instead of this weird 4 channels in 3 tracks? (I've looked at pan complex filter, but didn't figure out how to set it up in my case).
amerge adds the channels of the inputs. amix uses the channel count of the input with the most channels. So, switch to amix.
ffmpeg -y -i background.m4a -i voices.wav -filter_complex \
"[1:a]asplit=2[sc][mix];\
[0:a][sc]sidechaincompress=threshold=0.015:ratio=2:level_sc=0.8:release=500:attack=1[compr];\
[compr][mix]amix" sidechain_1.wav

Merging two video streams and saving as one file

I'm writing chat application with video call using webRTC. I have two MediaStreams, remote and local and want to merge and save them as one file. So when opening a file, i shall see large video frame (remote stream) and little video frame at top right (local stream). Now I can record these two streams separately using RecordRTC. How can i merge them with nodejs? (no code because I don't know how it's done)
You can use FFmpeg with -filter_complex, here is a working and tested example using FFmpeg version N-62162-gec8789a:
ffmpeg -i main_video.mp4 -i in_picture.mp4 -filter_complex "[0:v:0]scale=640x480[main_video]; [1:v:0]scale=240x180[in_picture];[main_video][in_picture]overlay=390:10" output.mp4
So, this command tells FFmpeg to read from two input files, main_video.mp4 and in_picture.mp4, then it send some information to the -filter_complex flag...
The -filter_complex flag takes the [0:v:0] (first input, first video track) and scale this video to be 640x480px and it identifies the video as [main_video], then, takes the [1:v:0] (second input, video track 0) and resize the video to 240x180px naming the video [in_picture], then it merges both videos making an overlay of the second one at x=390 y=10.
Then it saves the output to output.mp4
It is that what you want?
UPDATE: I forgot to add, all you need in node is a module to run FFmpeg, there are plenty of those:
https://nodejsmodules.org/tags/ffmpeg

Resources