FFmpeg: Get volume of frame or short timespan - audio

I want to make a program that automatically deletes frames from an mp4 when the volume for those frames is below a given threshold. How can I get the volume of each frame? Or, alternatively, the volume for time t in the video? Or for timespan dt?
(I'm not committed to FFmpeg yet, or even to the mp4 format, so feel free to suggest alternatives.)

Related

Is there a way to ensure mp3 duration accuracy with variable bit rate using FFMPEG?

In our application, we are processing audio files using ffmpeg. Specifically, we use the NodeJS library fluent-ffmpeg, (npm link).
Our audio files are generated from various text to speech providers. We recently noticed that when we converted audio using ssml to add pauses to the generated audio, the duration on the file is no longer correct. Upon further investigation, we noticed that the standard audios were also incorrect, just more accurate overall due to the more consistent data. When we put a pause at the beginning of the audio, the estimate was the worst, overshooting it by a very large margin (e.g., a 25s audio clip would read as 3 minutes long, but skip to the end when playing past the 25s mark.
I did some searching and research into the structure of MP3 files, and to me it seems like the issue is because the duration gets estimated by various audio players. Windows media player is an example, but Firefox's web player seems to also do this. I tried changing the ffmpeg command from using .audioQuality(0), which sets ffmpeg to use VBR, to .audioBitrate(320), which tells ffmpeg to use a constant bitrate.
For reference, the we are using libmp3lame, and the full command that gets run is the following, for the VBR and CBR cases respectively:
For VBR (broken durations): ffmpeg -i <URL> -acodec libmp3lame -aq 0 -f mp3 pipe:1
For CBR (correct duration): ffmpeg -i <URL> -acodec libmp3lame -b:a 320k -f mp3 pipe:1
Note: we then pipe the output to the requesting client application after sending the appropriate file headers, hence the pipe:1 output. The input is a cloud storage url where the source file is located
This fixes our problem of having a correct duration, and it makes sense to me why this would fix it if the problem was because the duration is being estimated by some of these players / audio consumers. But, this came at the cost that the file size was significantly larger, which also makes sense to me. While testing we found that compared to the same file in WAV, the VBR mp3 was about 10% of the WAV file size, while the CBR mp3 was still 50% of the WAV file size. This practically defeats the purpose of supporting the mp3 format for our use-case, which is a smaller but slightly lossy alternative to the large WAV file.
While researching, I found that there can be ID3 tags in a chunk at the beginning of the mp3 file, specifying information for the consumer of the audio to know the duration before potentially having processed the whole file. But, I also found that there doesn't seem to be a standard, at least for duration. More things like song title, album, artist, etc.
My question is, is there a way to get the proper duration onto an mp3 file, preferably via some ffmpeg mechanism, while still using VBR? Thanks!
FFmpeg does write a Xing header by default with duration info. However, that value is only known after the entire stream data has been received, so ffmpeg has to seek to the head to write it. Since you're piping the output, that can't be done.
Write the file locally or to some seekable destination, and then upload.

HLS Live streaming with re-encoding

I come to a technical problem and I need you.
Situation data:
I record the screen as well as 1 to 2 audio tracks (microphone and speaker).
These three recordings are done separately (it could be mixed but I don't prefer) and every 10s (this is configurable), I send the chunk of recorded data to my backend. We, therefore, have 2 to 3 chunks sent every 10s.
These data chunks are interdependent. Example: The 1st video chunk starts with the headers and a keyframe. The second chunk can be in the middle of a frame. It's like having the entire video and doing a random one-bit split.
The video stream is in h264 in a WebM container. I don't have a lot of control over it.
The audio stream is in opus in a WebM container. I can't use aac directly, nor do I have much control.
Given the reality, the server may be restarted randomly (crash, update, scaled, ...). It doesn't happen often (4 times a week). In addition, the customer can, once the recording ends on his side, close the application or his computer. This will prevent the end of the recording from being sent. Once it reconnects, the missing data chunks are sent. This, therefore, prevents the use of a "live" stream on the backend side.
Goals :
Store video and audio as it is received on the server in cloud storage.
Be able to start playing the video/audio even when the upload has not finished (so in a live stream)
As soon as the last chunks have been received on the server, I want the entire video to be already available in VoD (Video On Demand) with as little delay as possible.
Everything must be distributed with the audios in AAC. The audios can be mixed or not, and mixed or not with the video.
Current and blocking solution:
The most promising solution I have seen is using HLS to support the Live and VoD mode that I need. It would also bring a lot of optimization possibilities for the future.
Video isn't a problem in this context, here's what I do:
Every time I get a data chunk, I append it to a screen.webm file.
Then I spit the file with ffmpeg
ffmpeg -ss {total_duration_in_storage} -i screen.webm -c: v copy -f hls -hls_time 8 -hls_list_size 0 output.m3u8
I ignore the last file unless it's the last chunk.
I upload all the files to the cloud storage along with a newly updated output.m3u8 with the new file information.
Note: total_duration_in_storage corresponds to the time already uploaded
on cloud storage. So the sum of the parts presents in the last output.m3u8.
Note 2: I ignore the last file in point 3 because it allows me to have keyframes in each song of my playlist and therefore to be able to use a seeking which allows segmenting only the parts necessary for each new chunk.
My problem is with the audio. I can use the same method and it works fine, I don't re-encode. But I need to re-encode in aac to be compatible with HLS but also with Safari.
If I re-encode only the new chunks that arrive, there is an auditory glitch
The only possible avenue I have found is to re-encode and segment all the files each time a new chunk comes along. This will be problematic for long recordings (multiple hours).
Do you have any solutions for this problem or another way to achieve my goal?
Thanks a lot for your help!

ffmpeg mix 20 audio streams without making it quieter

I want to mix about 20 audio streams with ffmpeg amix, however as described here, amix has a weired way of making the input streams quieter the more of them you mix together:
"amix scales each input's volume by 1/n where n = no. of active inputs. This is evaluated for each audio frame. So when an input drops out, the volume of the remaining inputs is scaled by a smaller amount, hence their volumes increase"
How can I get rid of this anoying behaviour?
I just want the audio streams to keep the same loudness, since only one of them has actual audio in it at any give time anyway..
At the moment I end up with a file that is about 1/20 the loudness of the original, making it effectively unusable.
Adjust the volume of each stream by multiplying by n (20 in your case)
https://ffmpeg.org/ffmpeg-filters.html#volume

Defining moment of the audio attenuation through ffmpeg

There are audio tracks of different lengths in m4a format. And there's ffmpeg library for working with the media. Many of the tracks have the effect of "decay" in the end, and it is necessary to determine at what point it occurs (determined once and the value entered in the database along with other information about the track). Those. we must somehow determine that the track begins to fade, and its volume reached 30% compared to the total volume of the song. Is it possible to solve by means of ffmpeg, and if so, how?
If you run this command,
ffmpeg -i in.mp4
-af astats=metadata=1:reset=1,
ametadata=print:key=lavfi.astats.Overall.RMS_level:file=vol.log -vn -f null -
it will generate a file called vol.log which looks like this
frame:8941 pts:9155584 pts_time:190.741
lavfi.astats.Overall.RMS_level=-79.715762
frame:8942 pts:9156608 pts_time:190.763
lavfi.astats.Overall.RMS_level=-83.973798
frame:8943 pts:9157632 pts_time:190.784
lavfi.astats.Overall.RMS_level=-90.068668
frame:8944 pts:9158656 pts_time:190.805
lavfi.astats.Overall.RMS_level=-97.745197
frame:8945 pts:9159680 pts_time:190.827
lavfi.astats.Overall.RMS_level=-125.611266
frame:8946 pts:9160704 pts_time:190.848
lavfi.astats.Overall.RMS_level=-inf
frame:8947 pts:9161728 pts_time:190.869
lavfi.astats.Overall.RMS_level=-inf
The pts_time is the time index and the RMS level is the mean volume of that interval (21 ms here). Each drop of 6dB corresponds to a drop of half the present volume.
If you run the command with reset=0, the last reading in the generated log file will show the RMS volume for the whole file. Then the volume which is 30% of the mean volume is ~10.5 dB below the mean value.

Seting dwScale and dwRate values in the AVISTREAMHEADER structure at AVI muxing

During capturing from some audio and video sources and encoding at AVI container for synchronizing audio & video I set audio as a master stream and this gave best result for synchronizing.
http://msdn.microsoft.com/en-us/library/windows/desktop/dd312034(v=vs.85).aspx
But this method gives a higher FPS value as a result. About 40 or 50 instead of 30 FPS.
If this media file just playback - all OK, but if try to recode at different software to another video format appears out of sync.
How can I programmatically set dwScale and dwRate values in the AVISTREAMHEADER structure at AVI muxing?
How can I programmatically set dwScale and dwRate values in the AVISTREAMHEADER structure at AVI muxing?
MSDN:
This method works by adjusting the dwScale and dwRate values in the AVISTREAMHEADER structure.
You requested that multiplexer manages the scale/rate values, so you cannot adjust them. You should be seeing more odd things in your file, not just higher FPS. The file itself is perhaps out of sync and as soon as you process it with other applciations that don't do playback fine tuning, you start seeing issues. You might be having video media type showing one frame rate and effectively the rate is different.

Resources