I do the cut via:
ffmpeg -i long_clip.mp4 -ss 00:00:10.0 -c copy -t 00:00:04.0 short_clip.mp4
I need to know the precise time where did the ffmpeg do the cut (Time of the closest keyframe before the 00:00:10.0)
Currently, I'm using the following ffprobe command to list all the keyframes and select the closest before 00:00:10.0
ffprobe -show_frames -skip_frame nokey long_clip.mp4
It works extremely slow (I run It on Jetson Nano, and It is a few minutes to list the keyframes for 30 sec video, although the cutting is done in 0.2seconds)
I hope there is the much faster way to know the time of the keyframe where ffmpeg does the cut, at least because ffmpeg seeks to this keyframe and cuts the video less than in half a second.
So in other words the question is: How to get the time of the keyframe where ffmpeg does the cut not listing all the keyframes?
I think this is not possible. The most information you can get from a program is obtained when you use the verbosity level of debugging. For ffmpeg I just used
ffmpeg -v debug -i "Princess Chelsea - Frack.mp4" -ss 00:03:00.600 -c copy -to 00:03:03.800 3.mkv 2> out.txt
One has to redirect output, because there is too much of it with debug, it doesn't fit the terminal.
Unfortunately, it gives only some cryptic/internal messages, like
Automatically inserted bitstream filter 'vp9_superframe'; args=''
[matroska # 0x55987904cac0] Starting new cluster with timestamp 5 at offset 885 bytes
[matroska # 0x55987904cac0] Writing block of size 375 with pts 5, dts 5, duration 23 at relative offset 9 in cluster at offset 885. TrackNumber 2, keyframe 1
With less verbosity it gives less information. Therefore I think this is not possible. However, what is your actual question? Maybe you need something different apart from just knowing the time of cuts?..
For those who look how to actually cut at the proper time (as I was looking for): one has to apply not copy, but to actually decode the video anew.
Related
First of all, I'd preface this by saying I'm NO EXPERT with video manipulation,
although I've been fiddling with ffmpeg for years (in a fairly limited way). Hence, I'm not too flash with all the language folk often use... and how it affects what I'm trying to do in my manipulations... but I'll have a go with this anyway...
I've checked a few links here, for example:
ffmpeg - remove sequentially duplicate frames
...but the content didn't really help me.
I have some hundreds of video clips that have been created under both Windows and Linux using both ffmpeg and other similar applications. However, they have some problems with times in the video where the display is 'motionless'.
As an example, let's say we have some web site that streams a live video into, say, a Flash video player/plugin in a web browser. In this case, we're talking about a traffic camera video stream, for example.
There's an instance of ffmpeg running that is capturing a region of the (Windows) desktop into a video file, viz:-
ffmpeg -hide_banner -y -f dshow ^
-i video="screen-capture-recorder" ^
-vf "setpts=1.00*PTS,crop=448:336:620:360" ^
-an -r 25 -vcodec libx264 -crf 0 -qp 0 ^
-preset ultrafast SAMPLE.flv
Let's say the actual 'display' that is being captured looks like this:-
123456789 XXXXX 1234567 XXXXXXXXXXX 123456789 XXXXXXX
^---a---^ ^-P-^ ^--b--^ ^----Q----^ ^---c---^ ^--R--^
...where each character position represents a (sequence of) frame(s). Owing to a poor internet connection, a "single frame" can be displayed for an extended period (the 'X' characters being an (almost) exact copy of the immediately previous frame). So this means we have segments of the captured video where the image doesn't change at all (to the naked eye, anyway).
How can we deal with the duplicate frames?... and how does our approach change if the 'duplicates' are NOT the same to ffmpeg but LOOK more-or-less the same to the viewer?
If we simply remove the duplicate frames, the 'pacing' of the video is lost, and what used to take, maybe, 5 seconds to display, now takes a fraction of a second, giving a very jerky, unnatural motion, although there are no duplicate images in the video. This seems to be achievable using ffmpeg with an 'mp_decimate' option, viz:-
ffmpeg -i SAMPLE.flv ^ ... (i)
-r 25 ^
-vf mpdecimate,setpts=N/FRAME_RATE/TB DEC_SAMPLE.mp4
That reference I quoted uses a command that shows which frames 'mp_decimate' will remove when it considers them to be 'the same', viz:-
ffmpeg -i SAMPLE.flv ^ ... (ii)
-vf mpdecimate ^
-loglevel debug -f null -
...but knowing that (complicated formatted) information, how can we re-organize the video without executing multiple runs of ffmpeg to extract 'slices' of video for re-combining later?
In that case, I'm guessing we'd have to run something like:-
user specifies a 'threshold duration' for the duplicates
(maybe run for 1 sec only)
determine & save main video information (fps, etc - assuming
constant frame rate)
map the (frame/time where duplicates start)->no. of
frames/duration of duplicates
if the duration of duplicates is less than the user threshold,
don't consider this period as a 'series of duplicate frames'
and move on
extract the 'non-duplicate' video segments (a, b & c in the
diagram above)
create 'new video' (empty) with original video's specs
for each video segment
extract the last frame of the segment
create a short video clip with repeated frames of the frame
just extracted (duration = user spec. = 1 sec)
append (current video segment+short clip) to 'new video'
and repeat
...but in my case, a lot of the captured videos might be 30 minutes long and have hundreds of 10 sec long pauses, so the 'rebuilding' of the videos will take a long time using this method.
This is why I'm hoping there's some "reliable" and "more intelligent" way to use
ffmepg (with/without the 'mp_decimate' filter) to do the 'decimate' function in only a couple of passes or so... Maybe there's a way that the required segments could even be specified (in a text file, for example) and as ffmpeg runs it will
stop/restart it's transcoding at specified times/frame numbers?
Short of this, is there another application (for use on Windows or Linux) that could do what I'm looking for, without having to manually set start/stop points,
extracting/combining video segments manually...?
I've been trying to do all this with ffmpeg N-79824-gcaee88d under Win7-SP1 and (a different version I don't currently remember) under Puppy Linux Slacko 5.6.4.
Thanks a heap for any clues.
I assume what you want to do is to keep frames with motion and upto 1 second of duplicate frames but discard the rest.
ffmpeg -i in.mp4 -vf
"select='if(gt(scene,0.01),st(1,t),lte(t-ld(1),1))',setpts=N/FRAME_RATE/TB"
trimmed.mp4
What the select filter expression does is make use of an if-then-else operator:
gt(scene,0.01) checks if the current frame has detected motion relative to the previous frame. The value will have to be calibrated based on manual observation by seeing which value accurately captures actual activity as compared to sensor/compression noise or visual noise in the frame. See here on how to get a list of all scene change values.
If the frame is evaluated to have motion, the then clause evaluates st(1,t). The function st(val,expr) stores the value of expr in a variable numbered val and it also returns that expression value as its result. So, the timestamp of the kept frames will keep on being updated in that variable until a static frame is encountered.
The else clause checks the difference between the current frame timestamp and the timestamp of the stored value. If the difference is less than 1 second, the frame is kept, else discarded.
The setpts sanitizes the timestamps of all selected frames.
Edit: I tested my command with a video input I synthesized and it worked.
I've done a bit of work on this question... and have found the following works pretty well...
It seems like the input video has to have a "constant frame rate" for things to work properly, so the first command is:-
ffmpeg -i test.mp4 ^
-vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" ^
-vsync cfr test01.mp4
I then need to look at the 'scores' for each frame. Such a listing is produced by:-
ffmpeg -i test01.mp4 ^
-vf select="'gte(scene,0)',metadata=print" -f null -
I'll look at all those scores... and average them (mean) - a bit dodgy but it seems to work Ok. In this example, that average score is '0.021187'.
I then have to select a 'persistence' value -- how long to let the 'duplicated' frames run. If you force it to only keep one frame, the entire video will tend to run much too quickly... So, I've been using 0.2 seconds as a starting point.
So the next command becomes:-
ffmpeg -i test01.mp4 ^
-vf "select='if(gt(scene,0.021187),st(1,t),lte(t-ld(1),0.20))',
setpts=N/FRAME_RATE/TB" output.mp4
After that, the resultant 'output.mp4' video seems to work pretty well. It's only a bit of fiddling with the 'persistence' value that might need to be done to compromise between having a smoother-playing video and scenes that change a bit abruptly.
I've put together some Perl code that works Ok, which I'll work out how to post, if folks are interested in it... eventually(!)
Edit: Another advantage of doing this 'decimating', is that files are of shorter duration (obviously) AND they are smaller in size. For example, a sample video that ran for 00:07:14 and was 22MB in size went to 00:05:35 and 11MB.
Variable frame rate encoding is totally possible, but I don't think it does what you think it does. I am assuming that you wish to remove these duplicate frames to save space/bandwidth? If so, it will not work because the codec is already doing it. Codecs use reference frames, and only encode what has changed from the reference. Hence the duplicate frame take almost no space to begin with. Basically frames are just encoded as a packet of data saying, copy the previous frame, and make this change. The X frames have zero changes, so it only takes a few bytes to encode each one.
I use the following code to trim, pipe and concatenate my audio files.
sox "|sox audio.wav -p trim 0.000 =15.000" "|sox audio.wav -p trim 15.000" concatenated.wav
One would expect that concatenated.wav will sound identical compared to a.wav.
However, when both files are played simultaneously together, there is a distinct audio shift on concatenated.wav.
Normally this error is acceptable as it is in the milliseconds range. However, as the number of pipe increases (say more than 100), the amount of audio shift increases substantially.
What is the correct method to trim, pipe and concatenate audio files using SoX to prevent this error?
Edit 1: Samples was used instead of milliseconds. Still met the same problem.
The following code was used:
sox "|sox audio.wav -p trim 0s =661500s" "|sox audio.wav -p trim 661500s" concatenated.wav
Wave file sample rate is 44100hz. Sample size is 16 bit.
SoX 14-4-2 was used.
The problem is that sox may lose a few samples at the cut point of the trim command.
I had a similar problem and solved it by cutting not by milliseconds, but by samples, which of course depend on the sample rate.
If your cutpoints are multiples of the used sample rate, you will no longer lose samples and the combined parts will have the exact same length as the original.
I’d like to change the volume level of a particular time range/slice in an audio file using SoX.
Right now, I’m having to:
Trim the original file three times to get: the part before the audio effect change, the part during (where I’m changing the sound level), and the part after
Perform the effect to change the sound level on the extracted “middle” chunk of audio, in its own file
Splice everything back together, taking into account the fading/crossfading 5ms overlaps that SoX recommends
Is there a better way to do this that doesn’t involve writing a script to do the above?
For anyone who stumbles across this highly ranked thread, searching for a way to duck the middle of an audio file:
I've been playing with SoX for ages and the method I built uses pipes to process each part without creating all those temporary files!
The result is a single line solution, though you will need to set timings and so, unless your fade timings will be the same for all files, it may be useful to generate the line with an algorithm.
I was pleased to get piping working, as I know this aspect has proved difficult for others. The command line options can be difficult to get right. However I really didn't like the messy additional files as an alternative.
By using mix functionality and positioning each part using pad, then giving each section trim & fade we can also avoid use of 'splice' here. I really wasn't a fan.
A working single line example, tested in SoX 14.4.2 Windows:
It fades (ducks) by -6dB at 2 seconds, returning to 0dB at 5 seconds (using linear fades of 0.4 seconds):
sox -m -t wav "|sox -V1 inputfile.wav -t wav - fade t 0 2.2 0.4" -t wav "|sox -V1 inputfile.wav -t wav - trim 1.8 fade t 0.4 3.4 0.4 gain -6 pad 1.8" -t wav "|sox -V1 inputfile.wav -t wav - trim 4.8 fade t 0.4 0 0 pad 4.8" outputfile.wav gain 9.542
Let's make that a little more readable here by breaking it down into sections:
Section 1 = full volume, Section 2 = ducked, Section 3 = full volume
sox -m
-t wav "|sox -V1 inputfile.wav -t wav - fade t 0 2.2 0.4"
-t wav "|sox -V1 inputfile.wav -t wav - trim 1.8 fade t 0.4 3.4 0.4 gain -6 pad 1.8"
-t wav "|sox -V1 inputfile.wav -t wav - trim 4.8 fade t 0.4 0 0 pad 4.8"
outputfile.wav gain 9.542
Now, to break it down, very thoroughly
'-m' .. says we're going to mix (this automatically reduces gain, see last parameter)
'-t wav' .. says the piped command that follows will return a WAV (it seems the WAV header is being lost in the pipeline)
Then.. the FIRST piped part (full volume before duck)
'-V1' .. says ignore warnings - there will be a warning about not knowing length of output file for this specific section as it's piping out, but there should be no other warning from this operation
then the input filename
'-t wav' .. forces the output type
'-' .. is the standard name for a piped output which will return to SoX command line
'fade t 0 2.2 0.4' .. fades out the full volume section. t = linear. 0 fade in. Then (as we want the crossfade's halfway point to be at 2 seconds) we fade out by 2.2 seconds, with a 0.4 second fade (the fadeout parameter is for when the fade ENDS!)
'-t wav' .. to advise type of next part - as above
Then.. the SECOND piped part (the ducked section)
'-V1' .. again, to ignore output length warning - see above
then the same input filename
'-t wav' .. forces output type, as above
'-' .. for piped output, see above
'trim 1.8' .. because this middle section will hit the middle of the transition at 2 seconds, so (with a 0.4 second crossfade) the ducked audio file will start 0.2 seconds before that
'fade t 0.4 3.4 0.4' .. to fade in the ducked section & fade back out again. So a 0.4 fade in. Then (the most complicated part) as the next crossfade will end at 5.2 seconds we must take that figure minus trimmed amount for this section, so 5.2-1.8=3.4 (again this is because fadeout position deals with the end timing of the fadeout)
'gain -6' .. is the amount, in dB, by which we should duck
'pad 1.8' .. must match the trim figure above, so that amount of silence is inserted at the start to make it synch when sections are mixed
'-t wav' .. to advise type of next part - as above
Then.. the THIRD piped part (return to full level)
'-V1' .. again - see above
then the same input filename
-t wav' .. to force output type, as above
-' .. for piped output, see above
trim 4.8' .. this final section will start at 5 seconds, but (with a 0.4 second crossfade) the audio will start 0.2 seconds before that
'fade t 0.4 0 0' .. just fade in to this full volume section. No fade out
'pad 4.8' .. must match the trim figure above, as explained above
then output filename
'gain 9.542' .. looks tricky, but basically when you "-m" to mix 3 files the volume is reduced to 1/3 (one third) by SoX to give headroom.
Rather than defeating that, we boost to 300%. We get the dB amount of 9.542 with this formula 20*log(3)/log(10)
If you copy & paste the single line somewhere you can see it all easily, it's a lot less scary than the explanation!
Final though - I was initially concerned about whether the crossfades needed to be logarithmic rather than linear, but in my case from listening to the results linear has definitely given the sound I expected.
You may like to try longer crossfades, or have the point of transition happening earlier or later but I hope that single line gives hope to anyone who thought many temporary files would be required!
Let me know if more clarification would help!
audacity waveform
Okay, with ffmpeg and filters it's all quite simple.
Imagine that you have 2 tracks, A and B. And you want to crop ones and do something about the volume. So the solution would be:
ffmpeg -y -i 1.mp3 -i 2.mp3 i f454495482c151aea8761dda.mp3 -i f5544954796af4a171f11b57.mp3 -i f754495448788e35e6123679.mp3 -i f754495448788e35e6123679.mp3 -i f85449545e646dea98e5dd19.mp3 \
-filter_complex "[0]aformat=sample_fmts=fltp:sample_rates=44100:channel_layouts=stereo,volume='if(between(t,129.00,129.20),0.15000*(t - 129.00) + 0.03,1)':eval=frame,volume='if(between(t,129.20,181.50),-0.00057*(t - 129.20) + 0.06,1)':eval=frame,volume='if(between(t,181.50,181.60),0.40000*(t - 181.50) + 0.03,1)':eval=frame,volume='if(between(t,181.60,183.50),-0.03684*(t - 181.60) + 0.07,1)':eval=frame,volume='if(between(t,183.50,188.00),0.00000*(t - 183.50) + 0.00,1)':eval=frame,atrim=0.00:56.00,adelay=129000|129000|129000|129000,apad[0:o];[1]aformat=sample_fmts=fltp:sample_rates=44100:channel_layouts=stereo,volume='if(between(t,0.00,134.00),0.00000*(t - 0.00) + 0.06,1)':eval=frame,atrim=0.00:134.00,apad[1:o];[0:o][1:o]amix=inputs=28,atrim=duration=185.00" -shortest -ac 2 output.mp3
which will take 2 input files, transform both of the streams to the appropriate aformat and then apply volume filters.
The syntax for volume is simple: if time t is between some start and end time - then apply the volume filter, based on the desired start volume level plus by some coefficient multiplied by difference between the start time and current time t.
This will increase the volume linearly from initial volume to desired value on a range.
atrim will trim the audio chunk after the volume has been adjusted on all ranges.
ffmpeg is just amazing, the expressions could be very complex and many of math functions may be used in the expressions.
I am trying to output the begin-timestamps of periods of silence (since there is background noise, by silence I mean a threshold) in a given audio file. Eventually, I want to split the audio file into smaller audio files, given these timestamps. It is important that no part of the original file be discarded.
I tried
sox in.wav out.wav silence 1 0.5 1% 1 2.0 1% : newfile : restart
(courtesy http://digitalcardboard.com/blog/2009/08/25/the-sox-of-silence/)
Although, it somewhat did the job, it also trimmed and discarded the periods of silence, which I do not want happening.
Is 'silence' the right option, or is there a simpler way to accomplish what I need to do?
Thanks.
Unfortunately not Sox, but ffmpeg has a silencedetect filter that does exactly what you're looking for:
ffmpeg -i in.wav -af silencedetect=noise=-50dB:d=1 -f null -
(detecting threshold of -50db, for a minimum of 1 seconds, cribbed from the ffmpeg documentation)
...this would print a result like this:
Press [q] to stop, [?] for help
[silencedetect # 0x7ff2ba5168a0] silence_start: 264.718
[silencedetect # 0x7ff2ba5168a0] silence_end: 265.744 | silence_duration: 1.02612
size=N/A time=00:04:29.53 bitrate=N/A
There is (currently, at least) no way to make the silence effect output the position where it has detected silence, or to retain all of the silent audio.
If you are able to recompile SoX yourself, you could add an output statement yourself to find out about the cut positions, then use trim in a separate invocation to split the file. With the stock version, you are out of luck.
SoX can easily give you the timestamps of the actual silences in a text file. Not periods of silence though, but you can calculate those with a simple script
.dat Text Data files. These files contain a textual representation of the sample data. There is one line at the beginning that contains the sample
rate, and one line that contains the number of channels. Subsequent lines contain two or more numeric data intems: the time since the beginning of
the first sample and the sample value for each channel.
Values are normalized so that the maximum and minimum are 1 and -1. This file format can be used to create data files for external programs such as
FFT analysers or graph routines. SoX can also convert a file in this format back into one of the other file formats.
Example containing only 2 stereo samples of silence:
; Sample Rate 8012
; Channels 2
0 0 0
0.00012481278 0 0
So you can do sox in.wav out.dat, then parse the text file and consider a silence a sequence of rows with a value close to 0 (depending on your threshold)
necroposting:
You can run a separate script that iterates all of the sox output files, (for f in *.wav), and use the command; soxi -D $f to obtain the DURATION of the sound clip.
Then, get the system time in seconds date "+%s", then subtract to find the time the recording starts.
I posted this as comments under this related thread. However, they seem to have gone unnoticed =(
I've used
ffmpeg -i myfile.avi -f image2 image-%05d.bmp
to split myfile.avi into frames stored as .bmp files. It seemed to work except not quite. When recording my video, I recorded at a rate of 1000fps and the video turned out to be 2min29sec long. If my math is correct, that should amount to a total of 149,000 frames for the entire video. However, when I ran
ffmpeg -i myfile.avi -f image2 image-%05d.bmp
I only obtained 4472 files. How can I get the original 149k frames?
I also tried to convert the frame rate of my original AVI to 1000fps by doing
ffmpeg -i myfile.avi -r 1000 otherfile.avi
but this didn't seem to fix my concern.
ffmpeg -i myfile.avi -r 1000 -f image2 image-%07d.png
I am not sure outputting 150k bmp files will be a good idea. Perhaps png is good enough?
Part one of your math is good, the 2 minutes and 29 seconds is about 149 seconds. With 1000 fps that makes 149000 frames. However your output filename only has 5 positions for the number where 149000 has 6 positions, so try "image-%06d.bmp".
Then there is the disk size: Do your images fit on the disk? With bmp every image uses its own size. You might try to use jpeg pictures, they compress about 10 times better.
Another idea: If ffmpeg does not find a (reasonable) frame rate, it drops to 25 or 30 frames per second. You might need to specify it. Do so for both source and target, see the man page (man ffmpeg on unix):
To force the frame rate of the input file (valid for raw formats
only) to 1 fps and the frame rate of the output file to 24 fps:
ffmpeg -r 1 -i input.m2v -r 24 output.avi
For what it's worth: I use ffmpeg -y -i "video.mpg" -sameq "video.%04d.jpg" to split my video to pictures. The -sameq is to force the jpeg in a reasonable quality, the -y is to avoid allow overwrite questions. For you:
ffmpeg -y -r 1000 -i "myfile.avi" -sameq "image.%06d.jpg"
I think, there is a misconception here: the output of a HS video system is unlikely to have an output frame rate of 1000 fps but something rather normal as 30 (or 50/60) fps. Apart from overloading most video players with this kind of speed it would be counterproductive to show the sequence in the same speed as it was recorded.
Basically: 1 sec # 1000 fps input is something like 33 sec # 30 fps output.
Was the duration of the scene recorded really 2:29 min (resulting in a video ~82 min at normal rate) or took it about 4.5 sec (4472 frames) which is 2:29 min in normal playback?
I tried this on ubuntu 18.04 terminal.
ffmpeg -i input_video.avi output_frame_path_images%5d.png
where,
-i = Input