If I use FFmpeg to encode an MP3 file, I read the documantation that I can use -aq 0 for best quality, but I don't understand what will it do actually? Increasing the audio bitrate (highest bitrate = best quality)? How can it create the best file? Do I still need to specify the bitrate then?
-aq is an an alias for -qscale:a and it invokes the VBR mode for audio codecs that support VBR. The scale for MP3 (LAME encoder) runs from 0 to 10 with 0 being best quality. You don't need to specify the bitrate.
Related
In our application, we are processing audio files using ffmpeg. Specifically, we use the NodeJS library fluent-ffmpeg, (npm link).
Our audio files are generated from various text to speech providers. We recently noticed that when we converted audio using ssml to add pauses to the generated audio, the duration on the file is no longer correct. Upon further investigation, we noticed that the standard audios were also incorrect, just more accurate overall due to the more consistent data. When we put a pause at the beginning of the audio, the estimate was the worst, overshooting it by a very large margin (e.g., a 25s audio clip would read as 3 minutes long, but skip to the end when playing past the 25s mark.
I did some searching and research into the structure of MP3 files, and to me it seems like the issue is because the duration gets estimated by various audio players. Windows media player is an example, but Firefox's web player seems to also do this. I tried changing the ffmpeg command from using .audioQuality(0), which sets ffmpeg to use VBR, to .audioBitrate(320), which tells ffmpeg to use a constant bitrate.
For reference, the we are using libmp3lame, and the full command that gets run is the following, for the VBR and CBR cases respectively:
For VBR (broken durations): ffmpeg -i <URL> -acodec libmp3lame -aq 0 -f mp3 pipe:1
For CBR (correct duration): ffmpeg -i <URL> -acodec libmp3lame -b:a 320k -f mp3 pipe:1
Note: we then pipe the output to the requesting client application after sending the appropriate file headers, hence the pipe:1 output. The input is a cloud storage url where the source file is located
This fixes our problem of having a correct duration, and it makes sense to me why this would fix it if the problem was because the duration is being estimated by some of these players / audio consumers. But, this came at the cost that the file size was significantly larger, which also makes sense to me. While testing we found that compared to the same file in WAV, the VBR mp3 was about 10% of the WAV file size, while the CBR mp3 was still 50% of the WAV file size. This practically defeats the purpose of supporting the mp3 format for our use-case, which is a smaller but slightly lossy alternative to the large WAV file.
While researching, I found that there can be ID3 tags in a chunk at the beginning of the mp3 file, specifying information for the consumer of the audio to know the duration before potentially having processed the whole file. But, I also found that there doesn't seem to be a standard, at least for duration. More things like song title, album, artist, etc.
My question is, is there a way to get the proper duration onto an mp3 file, preferably via some ffmpeg mechanism, while still using VBR? Thanks!
FFmpeg does write a Xing header by default with duration info. However, that value is only known after the entire stream data has been received, so ffmpeg has to seek to the head to write it. Since you're piping the output, that can't be done.
Write the file locally or to some seekable destination, and then upload.
This is my first time here on stack overflow asking question.
I am stuck and really struggling with this. I am trying to make some of my MXF video files to be EBU r128 standard for its audio.
This means that it has to be -23 and not higher than 0.5.
My current process
Watch_folder > Encoding to MXF > Output_folder
I need to makesure when its comes to output folder, those MXF files are EBU R128 Loudness compliant.
What I have done so Far:
FFMPEG:
ffmpeg -i input.mxf -af loudnorm=I=-23:LRA=7:tp=-2:print_format=json -f null -
got the result:
Input Integrated: -15.1 LUFS
Input True Peak: +0.0 dBTP
Input LRA: 17.1 LU
Input Threshold: -26.2 LUFS
Output Integrated: -17.1 LUFS
Output True Peak: -1.5 dBTP
Output LRA: 5.3 LU
Output Threshold: -27.6 LUFS
Normalization Type: Dynamic
Target Offset: +1.1 LU
then i did
ffmpeg -i input.mxf -af loudnorm=I=-23:LRA=7:tp=-2:measured_I=-15.1:measured_LRA=17.1:measured_tp=0:measured_thresh=-27.6:offset=1.1 -ar 48k -y output.mxf
However, when i put it through the software Eff, it says that its not EBU compliant.
*EDIT:
This also reduces the quality. for example; my 6 Gb becomes 250 MB and you can tell the quality downgraded
ffmpeg-normalize
I did the following
ffmpeg-normalize input.mxf -c:a pcm_s32le -ar 48000 -o output.mxf
but this gives me errors.
if i do it without the output file type, i get a mkv which will not work for me. i need it to be mxf.
OK, a few issues here.
Firstly, if your file is measured at -26.2 LUFS, you'd need to add 3.2 dB to get it to -23. But you can't do that, because your true peak is too high (you'd be over full scale). You'll need to compress (dynamic audio compression, not file/rate compression) the audio or use at least a limiter to achieve this.
A good R128 audio track should be mixed properly rather than just run through a normaliser, otherwise you risk it either failing the standard or unwanted audio effects.
If you don't have access to audio editing software or someone who can do this for you, then FFMPEG does include an audio limiter, which will give you enough headroom to raise the level to -23 LUFS.
You can do that with something like this:
-filter_complex alimiter=level_in=1:level_out=1:limit=1.5:attack=7:release=100:level=disabled
However, tuning a limiter well depends on what the video file is of (music, speech, etc) and it is something that's worth taking some time over. Alter the attack and release values until you get the result you want.
Secondly, the reason that FFMPEG has produced a smaller file of lower quality is because you didn't specify anything in the video section. FFMPEG's default action with video is (usually) to encode to h264, so whatever your codec here is (I am assuming DNxHD from the fact that you're using an MXF wrapper) needs to be specified. FFMPEG will copy the video stream though and leave it alone if you include the option -c:v copy (which means copy video codec, basically).
Post your results once you have tried these...!
Say I have a bunch of mp3 files. How would I go about using an audio software command-line tool to decrease the volume completely on one side of the audio file (right), leaving on the left side of the audio file complete? I would then like to save this file to a new mp3 file. This needs to be done entirely over the command line.
As an another approach. Is it possible to use a command line audio file tool to convert a stereo mp3 file to mono, then to merge this mono file with a "silent" track of the same length, creating a left-headphone track with sound and a right-headphone track with silence?
In this SO question, there seems to be a number of approaches to a rather eccentric end goal. In the first possible solution, I just want to decrease the volume of the right side. In the second possible solution, I want to combine a few more common steps to achieve the same end result.
The problems here are that:
I can't find a good command-line tool for modifying audio files, even to do the second approach which should be a more common request.
I'm expecting that I'll first need to convert the mp3 file to wav, using a similar or second tool
This query is eccentric so there aren't many links about it on the web.
Thanks for any help. Audacity would be my go-to normally, but it appears to be GUI only.
SoX lets you do this very easily.
The first case, muted right channel:
sox test.mp3 test-rmuted.mp3 remix 1 0
The second case, summed mono on left channel:
sox test.mp3 test-lmono.mp3 remix 1,2 0
To batch process you could just do a simple for loop.
Muted right channel:
for f in *.mp3
do
basename="${f%.*}"
echo "$basename"
sox "$f" -t wav - remix 1 0 | \
lame --preset standard - "00-${basename}-rmute".mp3
done
Summed mono on left channel only:
for f in *.mp3
do
basename="${f%.*}"
echo "$basename"
sox "$f" -t wav - remix 1,2 0 | \
lame --preset standard - "00-${basename}-lmono".mp3
done
You can forgo LAME and do the encoding with SoX as in the first two examples, but I find this method simpler and more flexible.
As suggested in a comment you should be able to use FFmpeg to process your audio files. Dropping one channel completely would produce a different result than doing conversion to mono first. However, I think either could be achieved with the pan filter in FFMpeg.
https://trac.ffmpeg.org/wiki/AudioChannelManipulation
https://ffmpeg.org/ffmpeg-filters.html#pan
Attenuation of one channel
Decode mp3 file to wav
Create a new stereo wav file using the pan filter 100% to one channel
Encode the resulting wav file to mp3
Mixing both channels evenly in one channel, then attenuating the other channel
Decode mp3 file to wav
Create a new wav file using the pan filter with one channel 50% from left and 50% right, and the other channel with 0 gain
Encode the resulting wav file to mp3
I'm streaming few RTMP streams through nginx and I want to check every few seconds what stream has the highest volume.
Specifically these streams are of talking heads and I assume that usually only one of them is speaking at a time, and I'm trying to find which one.
Since nginx can output hls (Apple http live streaming) I decided to check every few seconds the last segment of each stream using ffmpeg.
Example:
ffmpeg -f mp3 -i /my/path/camera67/123.ts -af "volumedetect" -f null /dev/null
For some reason the max_volume is always zero (max_volume: 0.0 dB) and mean_volume seems meaningless regarding the volume.
Do you have any idea why it's always zero?
Is there a helpful way to understand mean_volume?
Can you think of a different tool that may give me the volume (e.g. mediainfo or ffprobe)?
I also tried:
ffmpeg -f lavfi -i amovie=/my/path/camera67/123.ts,volumedetect
This time I got:
[mpegts # 0x130bf40] start time for stream 1 is not set in estimate_timings_from_pts
[mpegts # 0x130bf40] Could not find codec parameters for stream 1 (Audio: aac ([15][0][0][0] / 0x000F), 0 channels, fltp): unspecified sample rate
Consider increasing the value for the 'analyzeduration' and 'probesize' options
[Parsed_amovie_0 # 0x130bcc0] No audio stream with index '-1' found
[lavfi # 0x130abc0] Error initializing filter 'amovie' with args '/my/path/camera67/123.ts'
amovie=/my/path/camera67/123.ts,volumedetect: Invalid argument
Any idea?
Thanks,
T.
So that's what happened.
I streamed MP3 to nginx that transcoded the input to HLS segments that doesn't support MP3.
Listening to the RTMP output caused me thinking that the audio is working fine, but when I listened to the HLS output I heard nothing.
I changed my original stream to AAC, then the HLS stream gave the right output and immediately I saw correlation between the music and the mean and max volumes.
Thank you all.
I'm looking for an audio format where a silence of a couple of hours at the beginning does not affect the overall file size. Has anyone any idea which one to use and what settings I have to use? I tried m4a, ogg and mp3 so far with no luck. An audio sample with 4 hours of silence in the beginning leads to a 400 MB file in some formats.
Of course, dealing with it programmatically would be the more sensible and SO way, something like SoX and the silence/pad effects. After all, any bit of silence is identical to any other bit of silence, trying to compress it is a bit of waste of effort.
Having said that, I was a little curious about this myself so I had a go at comparing how well the different codecs fared at compressing pure digital silence.
I created two test files. The first was a 44.1kHz 16bit 30 minutes long stereo WAVE file containing uncorrelated brown noise at -10.66 dBFS RMS. The second file was the same, except padded with 210 minutes of silence, making the total duration 240 minutes (or 4 hours). Next I encoded the files to various lossy and lossless codecs and looked at the size difference between the padded and unpadded files to gauge how efficiently the silence was encoded.
codec noise noise.silence diff ratio
wav 317.5 2540.0 2222.5 8.0
he-aac 14.6 116.5 101.9 8.0
vorbis 36.4 237.1 200.7 6.5
mp3 38.2 217.2 179.0 5.7
opus 27.0 81.6 54.6 3.0
tta 213.8 544.1 330.3 2.5
aac 54.0 131.7 77.7 2.4
wv 211.3 444.1 232.8 2.1
alac 212.5 393.7 181.2 1.9
flac 211.5 404.8 193.3 1.9
als 209.7 384.2 174.5 1.8
ofr 209.3 356.9 147.6 1.7
Codecs used:
Lossless
wav: WAVE
tta: True Audio v3.4.1
wv: WavPack v4.80.0 (wavpack -x)
alac: Apple Lossless
ofr: OptimFROG v5.100 (ofr --preset 2)
als: MPEG-4 Audio Lossless Coding v23 (mp4alsRM23 -a -b -o50)
flac: Free Lossless Audio Codec v1.3.1 (flac -8)
Lossy vbr
mp3: LAME MP3 v3.99.5 (lame -h -V2)
opus: Opus v1.1.2 (opusenc --bitrate 128 --framesize 40)
aac: Advanced Audio Codec v2.0 (afconvert -f 'm4af' -d aac -q 127 -s 3 -u vbrq 100)
vorbis: Vorbis aoTuV b5.5 (oggenc -q 5)
Lossy cbr
he-aac: High-Efficiency AAC v1 (afconvert -f 'm4af' -d aach -q 127 -s 0 -b 64000)
If you encode your audio file in .wav format, according to the "Multimedia Programming Interface and Data Specifications 1.0" at pages 56-60 you can encode, instead of the usual single "data" chunk, a "LIST" chunk of type 'wavl' alternating "data" and "slnt" chunks. For an interpretation of the obscure (and buggy) specification refer to the wikipedia page on the WAV format.
I'm not sure whether this helps, but if the size causes problems in storage or transfer, you can simply ZIP the wav and voilá! all the empty bytes disappear.
For usage you have to unpack it again though.
You might consider hacking the encoder to "pause" when it encounters more than a second or so of silence. Any of the codecs out there can be hacked to do this, though you will need to understand how they work before starting on changes like that...
Another option is to pipe the output of an MP3 encoder through a program that strips out "extra" silent frames. That might be less overall work (though you're still going to have to understand how MP3 framing & the Layer III bit reservoir work).