Normalizing audio in ffmpeg - how? - audio

I'm creating one of those "Brady Bunch" videos for a choir using a C# application I'm writing that uses ffmpeg for all the heavy lifting, and for the most part it's working great but I'm having trouble getting the audio levels just right.
What I'm doing right now, is first "normalizing" the audio from the individual singers like this:
Extract audio into a WAV file using ffmpeg
Load the WAV file into my application using NAudio
Find the maximum 16-bit value
When I create the merged video, specify a volume for this stream that boosts the maximum value to 32767
So, for example, if I have 3 streams: stream A's maximum audio is 32767 already, stream B's maximum audio is 32000, and stream C's maximum audio is 16000, then when I merge these videos I will specify
[0:a]volume=1.0,aresample=async=1:first_pts=0[aud0]
[1:a]volume=1.02,aresample=async=1:first_pts=0[aud1]
[2:a]volume=2.05,aresample=async=1:first_pts=0[aud2]
[aud0][aud1][aud2]amix=inputs=3[a]
(I have an additional "volume tweak" that lets me adjust the volume level of individual singers as necessary, but we can ignore that for this question)
I am reading the ffmpeg wiki on Audio Volume Manipulation, and I will implement that next, but I don't know what to do with the output it generates. It looks like I'm going to get mean and max volume levels in dB and while I understand decibels in a "yeah, I learned about those in college 30 years ago" kind of way, I don't know how to use those values to normalize the audio of my input videos.
The problem is, in the ffmpeg output video, the audio level is quite low. If I do the same process of extracting the audio and looking at the WAV file in the merged video that ffmpeg generated, the maximum value is only 4904.
How do I implement an algorithm that automatically sets the output volume to a "reasonable" level? I realize I can simply add a manual volume filter and have the human set the level, but that's going to be a lot of back & forth of generating the merged video, listening to it, adjusting the level, merging again, etc. I want a way where my application figures out an appropriate output volume (possibly with human adjustment allowed).
EDIT
Asking ffmpeg to determine the mean and max volume of each clip does provide mean and max volume in dB, and I can then use those values to scale each input clip:
[0:a]volume=3.40dB,aresample=async=1:first_pts=0[aud0]
[1:a]volume=3.90dB,aresample=async=1:first_pts=0[aud1]
[2:a]volume=4.40dB,aresample=async=1:first_pts=0[aud2]
[3:a]volume=-0.00dB,aresample=async=1:first_pts=0[aud3]
But my final video is still strangely quiet. For now, I've added a manually-entered volume factor that gets applied at the very end:
[aud0][aud1][aud2]amix=inputs=3[a]
[a]volume=volume=3.00[b]
So my question is, in effect, how do I determine algorithmically what this final volume factor needs to be?
MORE EDIT
There's something deeper going on here, I just set the volume filter to 100 and the output is only slightly louder. Here are my filters, and the relevant portions of the command line:
color=size=1920x1080:c=0x0000FF [base];
[0:v] scale=576x324 [clip0];
[0:a]volume=1.48,aresample=async=1:first_pts=0[aud0];
[1:v] crop=808:1022:202:276,scale=384x486 [clip1];
[1:a]volume=1.57,aresample=async=1:first_pts=0[aud1];
[2:v] crop=1160:1010:428:70,scale=558x486 [clip2];
[2:a]volume=1.66,aresample=async=1:first_pts=0[aud2];
[3:v] crop=1326:1080:180:0,scale=576x469 [clip3];
[3:a]volume=1.70,aresample=async=1:first_pts=0[aud3];
[4:a]volume=0.20,aresample=async=1:first_pts=0[aud4];
[5:a]volume=0.73,aresample=async=1:first_pts=0[aud5];
[6:v] crop=1326:1080:276:0,scale=576x469 [clip4];
[6:a]volume=1.51,aresample=async=1:first_pts=0[aud6];
[base][clip0] overlay=shortest=1:x=32:y=158 [tmp0];
[tmp0][clip1] overlay=shortest=1:x=768:y=27 [tmp1];
[tmp1][clip2] overlay=shortest=1:x=1321:y=27 [tmp2];
[tmp2][clip3] overlay=shortest=1:x=32:y=625 [tmp3];
[tmp3][clip4] overlay=shortest=1:x=672:y=625 [tmp4];
[aud0][aud1][aud2][aud3][aud4][aud5][aud6]amix=inputs=7[a];
[a]adelay=delays=200:all=1[b];
[b]volume=volume=100.00[c];
[c]asplit[a1][a2];
ffmpeg -y ....
-map "[tmp4]" -map "[a1]" -c:v libx264 "D:\voutput.mp4"
-map "[a2]" "D:\aoutput.mp3""
When I do this, the audio I want is louder (loud enough to clip and get distorted), but definitely not 100x louder.

after mixing audio, run
ffmpeg -i output.mp3 -filter:a volumedetect -map 0:a -f null /dev/null
get value from string like this:
[Parsed_volumedetect_0 # 0xdigitsletters] max_volume: -16.5 dB
add to filters this value, but positive: ...]amix=inputs=7,volume=16.5dB[a]
[edit]
do it after mixing audio.
[update]
I did some investigation:
[update 2]
#!/bin/bash
f="input 1.mp3"
INP=("-ss" "30" "-i" "$f")
FCT=1
FLA="[0:a:0]aresample=async=1:first_pts=0[0a0]; "
AUD="[0a0]"
MAX=$(ffmpeg -hide_banner -i "$f" -map 0:a -filter:a volumedetect -f null /dev/null 2>&1 | grep 'max_volume' | awk '{print $4, $5, $6}')
echo " $FCT $MAX"
for f in /mnt/sklad/Музыка/*.mp3; do
INP+=("-ss" "30" "-i" "$f")
FLA+="[${FCT}:a:0]aresample=async=1:first_pts=0[${FCT}a0]; "
AUD+="[${FCT}a0]"
((FCT++))
printf -v OUT "%02d" $FCT
ffmpeg -v error -hide_banner "${INP[#]}" -filter_complex "${FLA} ${AUD}amix=inputs=${FCT}[a]" -map [a] -c:a aac -q:a 4 -t 30 -y "out_${OUT}.mkv"
MAX=$(ffmpeg -hide_banner -i "$f" -map 0:a -filter:a volumedetect -f null /dev/null 2>&1 | grep 'max_volume' | awk '{print $5, $6}')
echo " $FCT $MAX"
done
for f in out_*.mkv; do
MAX=$(ffmpeg -hide_banner -i "$f" -map 0:a -filter:a volumedetect -f null /dev/null 2>&1 | grep 'max_volume' | awk '{print $5, $6}')
echo " $f $MAX"
done
output:
1 max_volume: -1.1 dB
2 -0.2 dB
3 0.0 dB
4 -1.9 dB
5 -0.1 dB
6 -0.9 dB
7 0.0 dB
8 0.0 dB
9 0.0 dB
10 0.0 dB
11 0.0 dB
12 0.0 dB
13 -0.5 dB
14 -1.1 dB
15 0.0 dB
16 0.0 dB
17 -0.0 dB
out_02.mkv -4.4 dB
out_03.mkv -5.0 dB
out_04.mkv -6.8 dB
out_05.mkv -7.1 dB
out_06.mkv -8.3 dB
out_07.mkv -8.9 dB
out_08.mkv -8.9 dB
out_09.mkv -8.8 dB
out_10.mkv -8.9 dB
out_11.mkv -9.7 dB
out_12.mkv -10.3 dB
out_13.mkv -11.1 dB
out_14.mkv -11.3 dB
out_15.mkv -10.6 dB
out_16.mkv -10.9 dB
out_17.mkv -11.2 dB
get other result, but still there is no strong pattern

Related

Cut a video in between key frames without re-encoding the full video using ffpmeg?

I would like to cut a video at the beginning at any particular timestamp, and it need to be precise, so the nearest key frame is not good enough.
Also, these videos are rather long - an hour or longer - so I would like to avoid re-encoding this altogether if possible, or otherwise only re-encode a minimal fraction of the total duration. Thus, would like to maximise the use of -vcodec copy.
How can I accomplish this using ffmpeg?
NOTE: See scenario, and my own rough idea for a possible solution below.
Scenario:
Original video
Length of 1:00:00
Has a key frame every 10s
Desired cut:
From 0:01:35 through till the end
Attempt #1:
Using -ss 0:01:35 -i blah.mp4 -vcodec copy, what results is a file where:
audio starts at 0:01:30
video also starts at 0:01:30
this starts both the audio and the video too early
using -i blah.mp4 -ss 0:01:35 -vcodec copy, what results is a file where:
audio starts at 0:01:35,
but the video is blank/ black for the first 5 seconds,
until 0:01:40, when the video starts
this starts the audio on time,
but the video starts too late
Rough idea
(1) cut 0:01:30 to 0:01:40
re-encode this to have new key frames,
including one at the target time of 0:01:35
then cut this to get the 5 seconds from 0:01:35 through 0:01:40
(2) cut 0:01:40 through till the end
without re-encoding, using -vcodec copy
(3) ffmpeg concat the first short clip (the 5 second one)
with the second long clip
I know/ can work out the commands for (2) and (3), but am unsure about what commands are needed for (1).
List timestamps of key frames:
ffprobe -v error -select_streams v:0 -skip_frame nokey -show_entries frame=pkt_pts_time -of csv=p=0 input.mp4
It will output something like:
0.000000
2.502000
3.795000
6.131000
10.344000
12.554000
16.266000
...
Let's say you want to delete timestamps 0 to 5, and then stream copy the remainder. The closest following key frame is 6.131.
Re-encode 5 to 6.131. Ensure the input and output match attributes and formats. For MP4 default settings should do most of the work, assuming H.264/AAC, but you may have to manually match the profile.
ffmpeg -i input.mp4 -ss 5 -to 6.131 trimmed.mp4
Make input.txt for the concat demuxer:
file 'trimmed.mp4'
file 'input.mp4'
inpoint 6.131
Concatenate:
ffmpeg -f concat -i input.mp4 -c copy output.mp4
try
ffmpeg -i src.mp4 -vcodec copy -reset_timestamps 1 -map 0 out.mp4
or
ffmpeg -i src.mp4 -vcodec copy -reset_timestamps 1 -map 0 src_.m3u8
which generates hls playlists

Using ffmpeg to split MP3 file to multiple equally sound length files

How to use the command line tool ffmpeg on Windows to split a sound file to multiple sound files without changing the sound properties same everything each one is fixed 30 seconds length. I got this manual example from here:
ffmpeg -i long.mp3 -acodec copy -ss 00:00:00 -t 00:00:30 half1.mp3
ffmpeg -i long.mp3 -acodec copy -ss 00:00:30 -t 00:00:30 half2.mp3
But is there a way to tell it to split the input file to equally sound files each one is 30 seconds and the last one is the remaining what ever length.
You can use the segment muxer.
ffmpeg -i long.mp3 -acodec copy -vn -f segment -segment_time 30 half%d.mp3
Add -segment_start_number 1 to start segment numbering from 1.

insert audio into another audio file (eg a censor bleep)

I need to insert a short beep into another audio file (similar to a censorship bleep) using linux and/or php.
I'm thinking there should be some way to do it with ffmpeg (with some combination of -t, concat, map, async, adelay, itsoffset?) or avconv or mkvmerge - but haven't found anyone doing this. Maybe I need to do it in 2 stages somehow?
For example if I have a 60 second mp3 and want to beep out 2 seconds at 2 places the desired result would be:
0:00-0:15 from original
0:15-0:17 beep (overwrites the 2 secs of original)
0:17-0:40 from original
0:40-0:42 beep
0:42-0:60 from original
I have a 2 second beep.mp3, but can use something else instead like -i "sine=frequency=1000:duration=2"
You can use the concat demuxer.
Create a text file, e.g.
file main.wav
inpoint 0
outpoint 15
file beep.wav
file main.wav
inpoint 17
outpoint 40
file beep.wav
file main.wav
inpoint 40
outpoint 42
and then
ffmpeg -f concat -i list.txt out.mp3
Convert the beep file to have the same sampling rate and channel count as the main audio.
First, you need to have beep.mp3 time equal to 60 seconds or little bit less than your mp3 file time.
Then, you can use ffmpeg code -ss <start_time> -t <duration> -i <your_file>.mp3
ffmpeg -ss 00:00:00 -t 15 -i ./original.mp3 -ss 00:15:00 -t 2 -i ./beep.mp3 -ss 00:17:00 -t 23 -i ./original.mp3 -ss 00:40:00 -t 2 -i ./beep.mp3 -ss 00:42:00 -i ./original.mp3 -filter_complex '[0:0][1:0] concat=n=2:v=0:a=1[out]' -map '[out]' ./output.mp3
at the end you will get output.mp3 file as you needed.

ffmpeg concat drops audio frames

I have an mp4 file and I want to take two sequential sections of the video out and render them as individual files, later recombining them back into the original video. For instance, with my video video.mp4, I can run
ffmpeg -i video.mp4 -ss 56 -t 4 out1.mp4
ffmpeg -i video.mp4 -ss 60 -t 4 out2.mp4
creating out1.mp4 which contains 00:00:56 to 00:01:00 of video.mp4, and out2.mp4 which contains 00:01:00 to 00:01:04. However, later I want to be able to recombine them again quickly (i.e., without reencoding), so I use the concat demuxer,
ffmpeg -f concat -safe 0 -i files.txt -c copy concat.mp4
where files.txt contains
file out1.mp4
file out2.mp4
which theoretically should give me back 00:00:56 to 00:01:04 of video.mp4, however there are always dropped audio frames where the concatenation occurs, creating a very unpleasant sound artifact, an audio blip, if you will.
I have tried using async and -af apad on initially creating the two sections of the video but I am still faced with the same problem, and have not found the solution elsewhere. I have experienced this issue in multiple different use cases, so hopefully this simple example will shed some light on the real problem.
I suggest you export segments to MOV with PCM audio, then concat those but with re-encoding audio.
ffmpeg -i video.mp4 -c:a pcm_s16le -ss 56 -t 4 out1.mov
...
and then
ffmpeg -f concat -safe 0 -i files.txt -c:v copy concat.mp4

FFMPEG Amix filter volume is not constant [duplicate]

I noticed that ffmpeg amix filter doesn't output good result in specific situation. It works fine if input files have equal duration. In that case volume is dropped in constant value and could be fixed with ",volume=2".
In my case I'm using files with different duration. Resulted volume is not good. First mixed stream resulted in lowest volume, and last one is highest. You can see on image that volume is increased linearly withing a time.
My command:
ffmpeg -i temp_0.mp4 -i user_2123_10.mp4 -i user_2123_3.mp4 -i user_2123_4.mp4
-i user_2123_7.mp4 -i user_2123_5.mp4 -i user_2123_1.mp4 -i user_2123_8.mp4
-i user_2123_0.mp4 -i user_2123_6.mp4 -i user_2123_9.mp4 -i user_2123_2.mp4
-i user_2123_11.mp4 -filter_complex "[1:a]adelay=34741.0[aud1];
[2:a]adelay=18241.0[aud2];[3:a]adelay=20602.0[aud3];
[4:a]adelay=27852.0[aud4];[5:a]adelay=22941.0[aud5];
[6:a]adelay=13142.0[aud6];[7:a]adelay=29810.0[aud7];
[8:a]adelay=12.0[aud8];[9:a]adelay=25692.0[aud9];
[10:a]adelay=32143.002[aud10];[11:a]adelay=16101.0[aud11];
[12:a]adelay=40848.0[aud12];
[0:a][aud1][aud2][aud3][aud4][aud5][aud6][aud7]
[aud8][aud9][aud10][aud11]
[aud12]amix=inputs=13:duration=first:dropout_transition=0"
-vcodec copy -y temp_1.mp4
That could be fixed by applying silence at the beginning and end of each clip, then they will have same duration and volume will be at the same level.
Please suggest how I can use amix to mix many inputs and ensure constant volume level.
amix scales each input's volume by 1/n where n = no. of active inputs. This is evaluated for each audio frame. So when an input drops out, the volume of the remaining inputs is scaled by a smaller amount, hence their volumes increase.
Changing the dropout_transition for all earlier inputs, as suggested in other answers, is one approach, but I think it will result in coarse volume modulations. Better method is to normalize the audio after the amix.
At present, you have two options, the loudnorm or the dynaudnorm filter. The latter is much faster
Syntax is to add it after the amix, so
[aud11][aud12]amix=inputs=13:duration=first:dropout_transition=0,dynaudnorm"
Read the documentation, if you wish to tweak parameters for maximum volume or RMS mode normalization..etc
The latest version of FFMPEG includes the normalize parameter for the amix filter, which you can use to turn off the constantly changing normalization. Here's the documentation for it.
Your amix filter string can be changed to:
[aud12]amix=inputs=13:normalize=0
The solution I've found is to specify the volume for each track in a "descendant" order and use no normalization filter afterwards.
I use this example, where I concat the same audio file in different positions:
ffmpeg -vn -i test.mp3 -i test.mp3 -i test.mp3 -filter_complex "[0]adelay=0|0,volume=3[a];[1]adelay=2000|2000,volume=2[b];[2]adelay=4000|4000,volume=1[c];[a][b][c]amix=inputs=3:dropout_transition=0" -q:a 1 -acodec libmp3lame -y amix-volume.mp3
More details, see this image. The first track is the normal mixing, the second is the one with volumes specified; the third is the original track. As we can see the 2nd track looks to have a normal volume.
ffmpeg -vn -i test.mp3 -i test.mp3 -i test.mp3 -filter_complex "[0]adelay=0|0[a];[1]adelay=2000|2000[b];[2]adelay=4000|4000[c];[a][b][c]amix=inputs=3:dropout_transition=0" -q:a 1 -acodec libmp3lame -y amix-no-volume.mp3
ffmpeg -vn -i test.mp3 -i test.mp3 -i test.mp3 -filter_complex "[0]adelay=0|0,volume=3[a];[1]adelay=2000|2000,volume=2[b];[2]adelay=4000|4000,volume=1[c];[a][b][c]amix=inputs=3:dropout_transition=0" -q:a 1 -acodec libmp3lame -y amix-volume.mp3
I can't really understand why amix changes the volume; anyway; I was digging around since a while for a good solution.
The solution seems to be a combination of "pre-amp", or multiplication, as Maxim puts it, AND you have to set dropout_transition >= max delay + max input length (or a very high number):
amix=inputs=13:dropout_transition=1000,volume=13
Notes:
amix has to resample float anyway, so there is no downside with adding the volume filter (which by default resamples to float, too).
And since we're using floats, there's no clipping and (almost) no loss of precision.
H't to #Mulvya for the analysis but their solution is frustratingly non-mathematical
I was originally trying to do this with sox, which was too slow. Sox's remix filter has the -m switch which disables the 1/n adjustment.
While faster, ffmpeg seems to be using way more memory for the same task. YMMV - I didn't test this thoroughly, because I finally settled on a small python script which uses pydub's overlay function, and only keeps the final output file and one segment in memory (whereas ffmpeg and sox seem to keep all of the segments in memory).
I got the same problem but found a solution!
First the Problem: i had to mix a background music file with 3 different TTS voice pieces that start with different delay. At the end the background sound was extremely loud.
I tried the suggested answer but it did not work for me, the end volume was still much higher. So my thoughts were: "All inputs must have the same length so everytime the same amount of audio is active in the mix"
apad on all TTS inputs with whole_len set and -shortest option in combination did the work for me.
Example call:
ffmpeg -y
-nostats
-hide_banner
-v quiet
-hwaccel auto
-f image2pipe
-i pipe:0
-i bgAudio.aac
-i TTS1.mp3
-i TTS2.mp3
-i TTS3.mp3
-filter_complex [1:a]loudnorm=I=-16:TP=-1.5:LRA=11:linear=false[a0];[2:a]loudnorm=I=-16:TP=-1.5:LRA=11:linear=false:dual_mono=true,adelay=7680|7680,apad=whole_len=2346240[a1];[3:a]loudnorm=I=-16:TP=-1.5:LRA=11:linear=false:dual_mono=true,adelay=14640|14640,apad=whole_len=2346240[a2];[4:a]loudnorm=I=-16:TP=-1.5:LRA=11:linear=false:dual_mono=true,adelay=3240|3240,apad=whole_len=2346240[a3];[a0][a1][a2][a3]amix=inputs=4:dropout_transition=0,asplit=6[audio0][audio1][audio2][audio3][audio4][audio5];[0:v]format=yuv420p,split=6[1080p][720p][480p][360p][240p][144p]
-map [audio0] -map [1080p] -s 1920x1080 -shortest out1080p.mp4
-map [audio1] -map [720p] -s 1280x720 -shortest out720p.mp4
-map [audio2] -map [480p] -s 858x480 -shortest out480p.mp4
-map [audio3] -map [360p] -s 640x360 -shortest out360p.mp4
-map [audio4] -map [240p] -s 426x240 -shortest out240p.mp4
-map [audio5] -map [144p] -s 256x144 -shortest out144p.mp4
Hope someone helps this!
Try to use multiplication:
"amix=inputs="+ chunks.length + ":duration=first:dropout_transition=3,volume=" + chunks.length
Sorry, for not sending ffmpeg output.
After all we ended up by writing small util in C++ for mixing audio. But first we converted mp4 to raw(pcm) format. That worked just fine for us, even requires addition HDD space for raw intermediate files.
Code looks like this:
short addSounds(short a, short b) {
double da = a;
da /= 65536.0;
da += 0.5;
double db = b;
db /= 65536.0;
db += 0.5;
double z = 0;
if (da < 0.5 && db < 0.5) {
z = 2 * da*db;
}
else {
z = 2 * ( da + db ) - 2 * da* db - 1;
}
z -= 0.5;
z *= 65536.0;
return (short)z;
}
I will show you my code.
"amix="+inputs.size()+",volume="+(inputs.size()+1)/2+"[mixout]\""
I don't use the code dropout_transition=0 because it will cause the problem you meet.
but I also find the problem that volume will be lower as the size of inputs increases.
so I make the volume louder.
try to change dropout transition to the duration of the first input:
duration=first:dropout_transition=_duration_of_the_first_input_in_seconds_
here is my ffmpeg command:
ffmpeg -y -i long.wav -i short.wav -filter_complex "[1:a]adelay=6000|6000[a1];[1:a]adelay=10000|10000[a2];[1:a]adelay=14000|14000[a3];[1:a]adelay=18000|18000[a4];[1:a]adelay=21000|21000[a5];[1:a]adelay=25500|25500[a6];[0:a][a1][a2][a3][a4][a5][a6]amix=inputs=7:duration=first:dropout_transition=32[aout]" -map "[aout]" -ac 2 -b:a 192k -ar 44100 output.mp3
see two dropout transitions as screenshot

Resources