I'm creating webm files via recording in the browser. However, the duration of the file isn't in the final file due to the nature of how the sound is captured (the recording is cut off at an arbritrary time when some other process ends).
I can get the correct duration by running:
$ ffmpeg -i filename.webm -f null -
$ [snip] size=N/A time=00:00:02.39 bitrate=N/A
Is there a way to fix the missing duration? i.e. inserting the time metadata back into the file without transcoding (or otherwise losing quality).
Related
In our application, we are processing audio files using ffmpeg. Specifically, we use the NodeJS library fluent-ffmpeg, (npm link).
Our audio files are generated from various text to speech providers. We recently noticed that when we converted audio using ssml to add pauses to the generated audio, the duration on the file is no longer correct. Upon further investigation, we noticed that the standard audios were also incorrect, just more accurate overall due to the more consistent data. When we put a pause at the beginning of the audio, the estimate was the worst, overshooting it by a very large margin (e.g., a 25s audio clip would read as 3 minutes long, but skip to the end when playing past the 25s mark.
I did some searching and research into the structure of MP3 files, and to me it seems like the issue is because the duration gets estimated by various audio players. Windows media player is an example, but Firefox's web player seems to also do this. I tried changing the ffmpeg command from using .audioQuality(0), which sets ffmpeg to use VBR, to .audioBitrate(320), which tells ffmpeg to use a constant bitrate.
For reference, the we are using libmp3lame, and the full command that gets run is the following, for the VBR and CBR cases respectively:
For VBR (broken durations): ffmpeg -i <URL> -acodec libmp3lame -aq 0 -f mp3 pipe:1
For CBR (correct duration): ffmpeg -i <URL> -acodec libmp3lame -b:a 320k -f mp3 pipe:1
Note: we then pipe the output to the requesting client application after sending the appropriate file headers, hence the pipe:1 output. The input is a cloud storage url where the source file is located
This fixes our problem of having a correct duration, and it makes sense to me why this would fix it if the problem was because the duration is being estimated by some of these players / audio consumers. But, this came at the cost that the file size was significantly larger, which also makes sense to me. While testing we found that compared to the same file in WAV, the VBR mp3 was about 10% of the WAV file size, while the CBR mp3 was still 50% of the WAV file size. This practically defeats the purpose of supporting the mp3 format for our use-case, which is a smaller but slightly lossy alternative to the large WAV file.
While researching, I found that there can be ID3 tags in a chunk at the beginning of the mp3 file, specifying information for the consumer of the audio to know the duration before potentially having processed the whole file. But, I also found that there doesn't seem to be a standard, at least for duration. More things like song title, album, artist, etc.
My question is, is there a way to get the proper duration onto an mp3 file, preferably via some ffmpeg mechanism, while still using VBR? Thanks!
FFmpeg does write a Xing header by default with duration info. However, that value is only known after the entire stream data has been received, so ffmpeg has to seek to the head to write it. Since you're piping the output, that can't be done.
Write the file locally or to some seekable destination, and then upload.
Good Day,
I would like to know if it is possible to "join" a portion of an mp3 file to another without re-encoding using ffmpeg. I need to prepend an audio mp3 file with silence to ensure it is 60 seconds long.
i.e if my audio file a.mp3 is 40 seconds I need to prepend 20 seconds of silence without re-encoding.
My thoughts on doing this was to have a 60 second long silence mp3 (silence.mp3) at the same CBR and sample rate of my audio (44100 and 40kbps). I then need to "trim" this file and concat/join with the audio file (a.mp3) appropriately.
I have a linux script that computes the required seconds to prepend and I tried using the following filter_complex expression:
ffmpeg -i silence.mp3 -i a.mp3 -filter_complex "[1]adelay=20000[b];[0][b]amix=2" out.mp3
This works however takes too long as it performs re-encoding which takes a long to process. Im looking for a non re encoding solution that can just join the correct sized portion of silence.mp3 to a.mp3. The commands would need to include as a parameter the length of silence that must be used from the overall silence.mp3 file.
Any advise is appreciated.
Your requirement is to not re-encode and yet that's what your method does.
Let's say you have a silent MP3 of the required duration ready.
Create a text file, list.txt
file silence.mp3
outpoint 20
file main.mp3
and join
ffmpeg -f concat -i list.txt -c copy merged.mp3
I assume the properties of silence.mp3 match the main audio file, in terms of channel count and sampling rate.
I have a bunch of mkv files, with FLAC as the audio codec and FFV1 as the video one.
The files were created using an EasyCap aquisition dongle from a VCR analog source. Specifically, I used VLC's "open acquisition device" prompt and selected PAL. Then, I converted the files (audio PCM, video raw YUV) to (FLAC, FFV1) using
ffmpeg.exe -i input.avi -acodec flac -vcodec ffv1 -level 3 -threads 4 -coder 1 -context 1 -g 1 -slices 24 -slicecrc 1 output.mkv
Now, the files are progressively out of sync. It may be due to the fact that while (maybe) the video has a constant framerate, the FLAC track has variable framerate. So, is there a way to sync the track to audio, or something alike? Can FFmpeg do this? Thanks
EDIT
On Mulvya hint, I plotted the difference in sync at various times; the first column shows the seconds elapsed, the second shows the difference - in secs. The plot seems to behave linearly, with 0.0078 as a constant slope. NOTE: measurements taken by hands, by means of a chronometer
EDIT 2
Playing around with VirtualDub, I found that changing the framerate to 25 fps from the original 24.889 (Video->Frame rate...->Change frame rate to) and using the track converted to wav definitely does work. Two problems, though: VirtualDub crashes when importing the original FFV1-FLAC mkv file, so I had to convert the video to H264 to try it out; more, I find it difficult to use an external encoder to save VirtualDub output.
So, could I avoid using VirtualDub, and simply use ffmpeg for it? Here's the exported vdscript:
VirtualDub.audio.SetSource("E:\\4_track2.wav", "");
VirtualDub.audio.SetMode(0);
VirtualDub.audio.SetInterleave(1,500,1,0,0);
VirtualDub.audio.SetClipMode(1,1);
VirtualDub.audio.SetEditMode(1);
VirtualDub.audio.SetConversion(0,0,0,0,0);
VirtualDub.audio.SetVolume();
VirtualDub.audio.SetCompression();
VirtualDub.audio.EnableFilterGraph(0);
VirtualDub.video.SetInputFormat(0);
VirtualDub.video.SetOutputFormat(7);
VirtualDub.video.SetMode(3);
VirtualDub.video.SetSmartRendering(0);
VirtualDub.video.SetPreserveEmptyFrames(0);
VirtualDub.video.SetFrameRate2(25,1,1);
VirtualDub.video.SetIVTC(0, 0, 0, 0);
VirtualDub.video.SetCompression();
VirtualDub.video.filters.Clear();
VirtualDub.audio.filters.Clear();
The first line imports the wav-converted audio track.
Can I set an equivalent pipe in ffmpeg (possibly, using FLAC - not wav)? SetFrameRate2 is maybe the key, here.
I'm streaming few RTMP streams through nginx and I want to check every few seconds what stream has the highest volume.
Specifically these streams are of talking heads and I assume that usually only one of them is speaking at a time, and I'm trying to find which one.
Since nginx can output hls (Apple http live streaming) I decided to check every few seconds the last segment of each stream using ffmpeg.
Example:
ffmpeg -f mp3 -i /my/path/camera67/123.ts -af "volumedetect" -f null /dev/null
For some reason the max_volume is always zero (max_volume: 0.0 dB) and mean_volume seems meaningless regarding the volume.
Do you have any idea why it's always zero?
Is there a helpful way to understand mean_volume?
Can you think of a different tool that may give me the volume (e.g. mediainfo or ffprobe)?
I also tried:
ffmpeg -f lavfi -i amovie=/my/path/camera67/123.ts,volumedetect
This time I got:
[mpegts # 0x130bf40] start time for stream 1 is not set in estimate_timings_from_pts
[mpegts # 0x130bf40] Could not find codec parameters for stream 1 (Audio: aac ([15][0][0][0] / 0x000F), 0 channels, fltp): unspecified sample rate
Consider increasing the value for the 'analyzeduration' and 'probesize' options
[Parsed_amovie_0 # 0x130bcc0] No audio stream with index '-1' found
[lavfi # 0x130abc0] Error initializing filter 'amovie' with args '/my/path/camera67/123.ts'
amovie=/my/path/camera67/123.ts,volumedetect: Invalid argument
Any idea?
Thanks,
T.
So that's what happened.
I streamed MP3 to nginx that transcoded the input to HLS segments that doesn't support MP3.
Listening to the RTMP output caused me thinking that the audio is working fine, but when I listened to the HLS output I heard nothing.
I changed my original stream to AAC, then the HLS stream gave the right output and immediately I saw correlation between the music and the mean and max volumes.
Thank you all.
There are audio tracks of different lengths in m4a format. And there's ffmpeg library for working with the media. Many of the tracks have the effect of "decay" in the end, and it is necessary to determine at what point it occurs (determined once and the value entered in the database along with other information about the track). Those. we must somehow determine that the track begins to fade, and its volume reached 30% compared to the total volume of the song. Is it possible to solve by means of ffmpeg, and if so, how?
If you run this command,
ffmpeg -i in.mp4
-af astats=metadata=1:reset=1,
ametadata=print:key=lavfi.astats.Overall.RMS_level:file=vol.log -vn -f null -
it will generate a file called vol.log which looks like this
frame:8941 pts:9155584 pts_time:190.741
lavfi.astats.Overall.RMS_level=-79.715762
frame:8942 pts:9156608 pts_time:190.763
lavfi.astats.Overall.RMS_level=-83.973798
frame:8943 pts:9157632 pts_time:190.784
lavfi.astats.Overall.RMS_level=-90.068668
frame:8944 pts:9158656 pts_time:190.805
lavfi.astats.Overall.RMS_level=-97.745197
frame:8945 pts:9159680 pts_time:190.827
lavfi.astats.Overall.RMS_level=-125.611266
frame:8946 pts:9160704 pts_time:190.848
lavfi.astats.Overall.RMS_level=-inf
frame:8947 pts:9161728 pts_time:190.869
lavfi.astats.Overall.RMS_level=-inf
The pts_time is the time index and the RMS level is the mean volume of that interval (21 ms here). Each drop of 6dB corresponds to a drop of half the present volume.
If you run the command with reset=0, the last reading in the generated log file will show the RMS volume for the whole file. Then the volume which is 30% of the mean volume is ~10.5 dB below the mean value.