How to Simply Remove Duplicate Frames from a Video using ffmpeg - linux

First of all, I'd preface this by saying I'm NO EXPERT with video manipulation,
although I've been fiddling with ffmpeg for years (in a fairly limited way). Hence, I'm not too flash with all the language folk often use... and how it affects what I'm trying to do in my manipulations... but I'll have a go with this anyway...
I've checked a few links here, for example:
ffmpeg - remove sequentially duplicate frames
...but the content didn't really help me.
I have some hundreds of video clips that have been created under both Windows and Linux using both ffmpeg and other similar applications. However, they have some problems with times in the video where the display is 'motionless'.
As an example, let's say we have some web site that streams a live video into, say, a Flash video player/plugin in a web browser. In this case, we're talking about a traffic camera video stream, for example.
There's an instance of ffmpeg running that is capturing a region of the (Windows) desktop into a video file, viz:-
ffmpeg -hide_banner -y -f dshow ^
-i video="screen-capture-recorder" ^
-vf "setpts=1.00*PTS,crop=448:336:620:360" ^
-an -r 25 -vcodec libx264 -crf 0 -qp 0 ^
-preset ultrafast SAMPLE.flv
Let's say the actual 'display' that is being captured looks like this:-
123456789 XXXXX 1234567 XXXXXXXXXXX 123456789 XXXXXXX
^---a---^ ^-P-^ ^--b--^ ^----Q----^ ^---c---^ ^--R--^
...where each character position represents a (sequence of) frame(s). Owing to a poor internet connection, a "single frame" can be displayed for an extended period (the 'X' characters being an (almost) exact copy of the immediately previous frame). So this means we have segments of the captured video where the image doesn't change at all (to the naked eye, anyway).
How can we deal with the duplicate frames?... and how does our approach change if the 'duplicates' are NOT the same to ffmpeg but LOOK more-or-less the same to the viewer?
If we simply remove the duplicate frames, the 'pacing' of the video is lost, and what used to take, maybe, 5 seconds to display, now takes a fraction of a second, giving a very jerky, unnatural motion, although there are no duplicate images in the video. This seems to be achievable using ffmpeg with an 'mp_decimate' option, viz:-
ffmpeg -i SAMPLE.flv ^ ... (i)
-r 25 ^
-vf mpdecimate,setpts=N/FRAME_RATE/TB DEC_SAMPLE.mp4
That reference I quoted uses a command that shows which frames 'mp_decimate' will remove when it considers them to be 'the same', viz:-
ffmpeg -i SAMPLE.flv ^ ... (ii)
-vf mpdecimate ^
-loglevel debug -f null -
...but knowing that (complicated formatted) information, how can we re-organize the video without executing multiple runs of ffmpeg to extract 'slices' of video for re-combining later?
In that case, I'm guessing we'd have to run something like:-
user specifies a 'threshold duration' for the duplicates
(maybe run for 1 sec only)
determine & save main video information (fps, etc - assuming
constant frame rate)
map the (frame/time where duplicates start)->no. of
frames/duration of duplicates
if the duration of duplicates is less than the user threshold,
don't consider this period as a 'series of duplicate frames'
and move on
extract the 'non-duplicate' video segments (a, b & c in the
diagram above)
create 'new video' (empty) with original video's specs
for each video segment
extract the last frame of the segment
create a short video clip with repeated frames of the frame
just extracted (duration = user spec. = 1 sec)
append (current video segment+short clip) to 'new video'
and repeat
...but in my case, a lot of the captured videos might be 30 minutes long and have hundreds of 10 sec long pauses, so the 'rebuilding' of the videos will take a long time using this method.
This is why I'm hoping there's some "reliable" and "more intelligent" way to use
ffmepg (with/without the 'mp_decimate' filter) to do the 'decimate' function in only a couple of passes or so... Maybe there's a way that the required segments could even be specified (in a text file, for example) and as ffmpeg runs it will
stop/restart it's transcoding at specified times/frame numbers?
Short of this, is there another application (for use on Windows or Linux) that could do what I'm looking for, without having to manually set start/stop points,
extracting/combining video segments manually...?
I've been trying to do all this with ffmpeg N-79824-gcaee88d under Win7-SP1 and (a different version I don't currently remember) under Puppy Linux Slacko 5.6.4.
Thanks a heap for any clues.

I assume what you want to do is to keep frames with motion and upto 1 second of duplicate frames but discard the rest.
ffmpeg -i in.mp4 -vf
"select='if(gt(scene,0.01),st(1,t),lte(t-ld(1),1))',setpts=N/FRAME_RATE/TB"
trimmed.mp4
What the select filter expression does is make use of an if-then-else operator:
gt(scene,0.01) checks if the current frame has detected motion relative to the previous frame. The value will have to be calibrated based on manual observation by seeing which value accurately captures actual activity as compared to sensor/compression noise or visual noise in the frame. See here on how to get a list of all scene change values.
If the frame is evaluated to have motion, the then clause evaluates st(1,t). The function st(val,expr) stores the value of expr in a variable numbered val and it also returns that expression value as its result. So, the timestamp of the kept frames will keep on being updated in that variable until a static frame is encountered.
The else clause checks the difference between the current frame timestamp and the timestamp of the stored value. If the difference is less than 1 second, the frame is kept, else discarded.
The setpts sanitizes the timestamps of all selected frames.
Edit: I tested my command with a video input I synthesized and it worked.

I've done a bit of work on this question... and have found the following works pretty well...
It seems like the input video has to have a "constant frame rate" for things to work properly, so the first command is:-
ffmpeg -i test.mp4 ^
-vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" ^
-vsync cfr test01.mp4
I then need to look at the 'scores' for each frame. Such a listing is produced by:-
ffmpeg -i test01.mp4 ^
-vf select="'gte(scene,0)',metadata=print" -f null -
I'll look at all those scores... and average them (mean) - a bit dodgy but it seems to work Ok. In this example, that average score is '0.021187'.
I then have to select a 'persistence' value -- how long to let the 'duplicated' frames run. If you force it to only keep one frame, the entire video will tend to run much too quickly... So, I've been using 0.2 seconds as a starting point.
So the next command becomes:-
ffmpeg -i test01.mp4 ^
-vf "select='if(gt(scene,0.021187),st(1,t),lte(t-ld(1),0.20))',
setpts=N/FRAME_RATE/TB" output.mp4
After that, the resultant 'output.mp4' video seems to work pretty well. It's only a bit of fiddling with the 'persistence' value that might need to be done to compromise between having a smoother-playing video and scenes that change a bit abruptly.
I've put together some Perl code that works Ok, which I'll work out how to post, if folks are interested in it... eventually(!)
Edit: Another advantage of doing this 'decimating', is that files are of shorter duration (obviously) AND they are smaller in size. For example, a sample video that ran for 00:07:14 and was 22MB in size went to 00:05:35 and 11MB.

Variable frame rate encoding is totally possible, but I don't think it does what you think it does. I am assuming that you wish to remove these duplicate frames to save space/bandwidth? If so, it will not work because the codec is already doing it. Codecs use reference frames, and only encode what has changed from the reference. Hence the duplicate frame take almost no space to begin with. Basically frames are just encoded as a packet of data saying, copy the previous frame, and make this change. The X frames have zero changes, so it only takes a few bytes to encode each one.

Related

Beeping out portions of an audio file using ffmpeg

I'm trying to use ffmpeg to beep out sections of an audio file (say 10-15 and 20-30). However only the first portion(10-20) gets beeped, whilst the next portion gets muted.
ffmpeg -i input.mp3 -filter_complex "[0]volume=0:enable='between(t,10,15)+between(t,20,30)'[main];sine=d=5:f=800,adelay=10s,pan=stereo|FL=c0|FR=c0[beep];[main][beep]amix=inputs=2" output.wav
Using this as my reference, but not able to make much progress.
Edit : Well, sine=d=5 clearly mentions the duration as 5 (my bad). Seems like this command can be used to add beeping to only one specific portion, how can I possibly change it to add beeps to different sections with varying durations.
ffmpeg -i input.mp3 -af "volume=enable='between(t,5,10)':volume=0[main];sine=d=5:f=800,adelay=5s,pan=stereo|FL=c0|FR=c0[beep];[main][beep]amix=inputs=2,
volume=enable='between(t,15,20)':volume=0[main];sine=d=5:f=800,adelay=15s,pan=stereo|FL=c0|FR=c0[beep];[main][beep]amix=inputs=2, volume=enable='between(t,40,50)':volume=0[main];sine=d=10:f=800,adelay=40s,pan=stereo|FL=c0|FR=c0[beep];[main][beep]amix=inputs=2" output.wav
The above code beeps 5-10, 15-20 and 40-50
This seems to work. Separating the different beeping settings with a ,(comma) and making changes at all 3 places: between, sine=d=x where x seems to be the duration and adelay=ys where y is the delay, meaning when the beeping starts. So between would be (t, y, y+x).
References : Mute specified sections of an audio file using ffmpeg and FFMPEG:Adding beep sound to another audio file in specific time portions
Would love to know a more easier/convenient way of doing this. So I'm not marking this as an answer.

ffmpeg cut the video and get accurate begining time of the result

I do the cut via:
ffmpeg -i long_clip.mp4 -ss 00:00:10.0 -c copy -t 00:00:04.0 short_clip.mp4
I need to know the precise time where did the ffmpeg do the cut (Time of the closest keyframe before the 00:00:10.0)
Currently, I'm using the following ffprobe command to list all the keyframes and select the closest before 00:00:10.0
ffprobe -show_frames -skip_frame nokey long_clip.mp4
It works extremely slow (I run It on Jetson Nano, and It is a few minutes to list the keyframes for 30 sec video, although the cutting is done in 0.2seconds)
I hope there is the much faster way to know the time of the keyframe where ffmpeg does the cut, at least because ffmpeg seeks to this keyframe and cuts the video less than in half a second.
So in other words the question is: How to get the time of the keyframe where ffmpeg does the cut not listing all the keyframes?
I think this is not possible. The most information you can get from a program is obtained when you use the verbosity level of debugging. For ffmpeg I just used
ffmpeg -v debug -i "Princess Chelsea - Frack.mp4" -ss 00:03:00.600 -c copy -to 00:03:03.800 3.mkv 2> out.txt
One has to redirect output, because there is too much of it with debug, it doesn't fit the terminal.
Unfortunately, it gives only some cryptic/internal messages, like
Automatically inserted bitstream filter 'vp9_superframe'; args=''
[matroska # 0x55987904cac0] Starting new cluster with timestamp 5 at offset 885 bytes
[matroska # 0x55987904cac0] Writing block of size 375 with pts 5, dts 5, duration 23 at relative offset 9 in cluster at offset 885. TrackNumber 2, keyframe 1
With less verbosity it gives less information. Therefore I think this is not possible. However, what is your actual question? Maybe you need something different apart from just knowing the time of cuts?..
For those who look how to actually cut at the proper time (as I was looking for): one has to apply not copy, but to actually decode the video anew.

Splitting an Audio File Into Equal-Lenght Segments Using FFmpeg

I want to split an audio file into several equal-length segments using FFmpeg. I want to specify the general segment duration (no overlap), and I want FFmpeg to render as many segments as it takes to go over the whole audio file (in other words, the number of segments to be rendered is unspecified).
Also, since I am not very experienced with FFmpeg (I only use it to make simple file conversions with few arguments), I would like a description of the code you should use to do this, rather than just a piece of code that I won't necessarily understand, if possible.
Thank you in advance.
P.S. Here's the context for why I'm trying to do this:
I would like to sample a song into single-bar loops automatically, instead of having to chop them manually using a DAW. All I want to do is align the first beat of the song to the beat grid in my DAW, and then export that audio file and use it to generate one-bar loops in FFmpeg.
In the future, I will try to do something like a batch command in which one can specify the tempo and key signature, and it will generate the loops using FFmpeg automatically (as long as the loop is aligned to the beat grid, as I've mentioned earlier). 😀
You can use the segment muxer. Basic example:
ffmpeg -i input.wav -f segment -segment_time 2 output_%03d.wav
-f segment indicates that the segment muxer should be used for the output.
-segment_time 2 makes each segment 2 seconds long.
output_%03d.wav is the output file name pattern which will result iin output_000.wav, output_001.wav, output_002.wav, and so on.

hls video: Why is there a difference between segment length and target duration?

I am trying to prepare the files for a hls video.
As player I am using video.js and I am transcoding my content with ffmpeg into multiple streams of different size and bitrate.
I tried a lot of options but mainly I kept framerate and bitrate constant and produced iframes every second as I want to have 3s segments.
Then I segmented the streams with mp4hls and processed the playlists.
It seems to work all perfectly, the playlists are correct, the iframes also, but:
The length of the segments is 2 seconds and not like expected 3 seconds?
something like:
...-b:v: 192k -bufsize 200k -maxrate 192k -r 30 -g 30 -x264opts no-scenecut
and in python:
mp4hls --segment-duration 3 320x180.mp4
480x270.mp4
640x360.mp4
....
I would like to know if there is somewhere an error in my workflow
or if this is correct as I read in the hls specifications
that the segment must be equal or smaller than the #EXT-X-TARGETDURATION:3
Can somebody please explain to a beginner, why the segments are not the same
length than
written in the playlist. I could find nothing about this topic. Thanks.
All segments don’t need to the same duration. The target duration is just a hint as to how often the player should expect a manifest update.
This is almost always a framerate issue. It looks like you’re using FFMPEG. It has default settings for scene-cut and works out its own keyframes.. Therefore, you need to set this explicitly if you want specifically timed segments. Off the top of my head you need to set GOP, and Keyframe values.. Something like:
-g 60 -keyint_min 60 -sc_threshold 0
Will give you a strict 2 second segment with a 30fps input

Defining moment of the audio attenuation through ffmpeg

There are audio tracks of different lengths in m4a format. And there's ffmpeg library for working with the media. Many of the tracks have the effect of "decay" in the end, and it is necessary to determine at what point it occurs (determined once and the value entered in the database along with other information about the track). Those. we must somehow determine that the track begins to fade, and its volume reached 30% compared to the total volume of the song. Is it possible to solve by means of ffmpeg, and if so, how?
If you run this command,
ffmpeg -i in.mp4
-af astats=metadata=1:reset=1,
ametadata=print:key=lavfi.astats.Overall.RMS_level:file=vol.log -vn -f null -
it will generate a file called vol.log which looks like this
frame:8941 pts:9155584 pts_time:190.741
lavfi.astats.Overall.RMS_level=-79.715762
frame:8942 pts:9156608 pts_time:190.763
lavfi.astats.Overall.RMS_level=-83.973798
frame:8943 pts:9157632 pts_time:190.784
lavfi.astats.Overall.RMS_level=-90.068668
frame:8944 pts:9158656 pts_time:190.805
lavfi.astats.Overall.RMS_level=-97.745197
frame:8945 pts:9159680 pts_time:190.827
lavfi.astats.Overall.RMS_level=-125.611266
frame:8946 pts:9160704 pts_time:190.848
lavfi.astats.Overall.RMS_level=-inf
frame:8947 pts:9161728 pts_time:190.869
lavfi.astats.Overall.RMS_level=-inf
The pts_time is the time index and the RMS level is the mean volume of that interval (21 ms here). Each drop of 6dB corresponds to a drop of half the present volume.
If you run the command with reset=0, the last reading in the generated log file will show the RMS volume for the whole file. Then the volume which is 30% of the mean volume is ~10.5 dB below the mean value.

Resources