I want to split an audio file into several equal-length segments using FFmpeg. I want to specify the general segment duration (no overlap), and I want FFmpeg to render as many segments as it takes to go over the whole audio file (in other words, the number of segments to be rendered is unspecified).
Also, since I am not very experienced with FFmpeg (I only use it to make simple file conversions with few arguments), I would like a description of the code you should use to do this, rather than just a piece of code that I won't necessarily understand, if possible.
Thank you in advance.
P.S. Here's the context for why I'm trying to do this:
I would like to sample a song into single-bar loops automatically, instead of having to chop them manually using a DAW. All I want to do is align the first beat of the song to the beat grid in my DAW, and then export that audio file and use it to generate one-bar loops in FFmpeg.
In the future, I will try to do something like a batch command in which one can specify the tempo and key signature, and it will generate the loops using FFmpeg automatically (as long as the loop is aligned to the beat grid, as I've mentioned earlier). 😀
You can use the segment muxer. Basic example:
ffmpeg -i input.wav -f segment -segment_time 2 output_%03d.wav
-f segment indicates that the segment muxer should be used for the output.
-segment_time 2 makes each segment 2 seconds long.
output_%03d.wav is the output file name pattern which will result iin output_000.wav, output_001.wav, output_002.wav, and so on.
Related
I have multiple videos of same resolution. Each video has a different length. And I also have a fixed length for output file. Let's say 4 minutes. Let's assume there are 4 input files each of 30 seconds but each input file could have different length. I want to put first 30 secs of output file blank and the next 30 secs as 1st input file and next 10 secs as blank and next 30 secs as 2nd input file so on. Basically I have a predetermined start point for each input file and between the gaps there should be black screen. How can I achieve this ? ffmpeg commands are fine but I'm going to have to automate this in nodejs so if you can give me any tips on it that'd be great!
There doesn't seem to be a single ffmpeg command to do this so I had to split the problem into a smaller problems.
First I generated a list of video segments that are going to a part of the final output video. Now some of these segments are already present and some are to be a black video.
So I used an ffmpeg command to generate a black video with silent audio with the desired length. So now I have all the segments I need and it's just a matter of combining them one after another.
Using FFmpeg, I am trying to combine many audio files into one long one, with a crossfade between each of them. To keep the numbers simple, let's say I have 10 input files, each 5 minutes, and I want a 10 second crossfade between each. (Resulting duration would be 48:30.) Assume all input files have the same codec/bitrate.
I was pleasantly surprises to find how simple it was to crossfade two files:
ffmpeg -i 0.mp3 -i 1.mp3 -vn -filter_complex acrossfade=d=10:c1=tri:c2=tri out.mp3
But the acrossfade filter does not allow 3+ inputs. So my naive solution is to repeatedly run ffmpeg, crossfading the previous intermediate output with the next input file. It's not ideal. It leads me to two questions:
1. Does acrossfade losslessly copy the streams? (Except where they're actively crossfading, of course.) Or do the entire input streams get reencoded?
If the input streams are entirely reencoded, then my naive approach is very bad. In the example above (calling acrossfade 9 times), the first 4:50 of the first file would be reencoded 9 times! If I'm combining 50 files, the first file gets reencoded 49 times!
2. To avoid multiple runs and the reencoding issue, can I achieve the many-crossfade behavior in a single ffmpeg call?
I imagine I would need some long filtergraph, but I haven't figured it out yet. Does anyone have an example of crossfading just 3 input files? From that I could automate the filtergraphs for longer chains.
Thanks for any tips!
I'm using gstreamer (gst-launch-1.0 actually) to receive audio and encode it using flacenc. At this point, for testing, the command line looks like this:
gst-launch-1.0 -q autoaudiosrc ! flacenc ! fdsink
This is actually launched by a separate program that gets the FLAC native format data via the child process's stdout.
Now, what I want to be able to do, for archiving purposes, is segment this audio stream into multiple files of limited duration, e.g. one file per minute. I have written code that does the minimal work necessary to parse the stream, segment audio frames, buffer them, and output fully-formed FLAC files. However, in the long term, I'm concerned about the CPU load once I'm archiving hundreds of streams.
The main problem is the frame number. It has a variable length encoding, and even worse, this requires two CRCs to be recomputed for every frame. Wouldn't it be nice if I could either:
Have gstreamer reset the frame number every so often, or even better
Have gstreamer start a whole new file mid-stream?
The latter case would be ideal. If I just dumped this to a file, it wouldn't be a valid FLAC file. After the first segment, the reader would find a file header where it expects a frame header and puke. But I can handle that in my receiving code.
I'm working on trying to figure out how to use various mux and split filters, but most combinations I have tried have resulted in errors of this ilk:
WARNING: erroneous pipeline: could not link flacenc0 to splitmuxsink0
I am also aware that I can use the gstreamer library and probably do stuff like this in my own code where I keep the audio source going and keep bringing the FLAC encoder up and down. A few months ago, I tried to figure out in general how to write programs that link to the gstreamer API and just got thoroughly lost. I was probably not looking at the right docs.
I've also so far found clever ways to always do what I wanted to do with the gstreamer command line. For instance, I managed to get metadata inserted into an tsmpeg stream from a fifo. So maybe I can manage to solve this problem the same way, with some help from kind stackoverflow users. :)
CLARIFICATION: I don't want gstreamer to write multiple files. I want it to generate multiple files but have them concatenated going through stdout and have a completely separate program split them into files.
The default muxer selected by splitmuxsink is mp4mux, which does not support flac. Setting muxer=matroskamux as an example would help you using splitmuxsink. Though you'll get FLAC contained into matroska, which may or may not be what you want.
While this is likely not working yet, you could try and make flacparse usable as a muxer in splitmuxsink in order to avoid the container.
Meanwhile, you can always use a container for the split, and then remove the container using the sink property. The following is an example pipeline the generates 5 seconds flac files.
gst-launch-1.0 audiotestsrc ! flacenc ! flacparse ! sm.audio_0 \
splitmuxsink name=sm muxer=matroskamux \
location=audio%05d.flac \
max-size-time=5000000000 \
sink="matroskademux ! filesink"
First of all, I'd preface this by saying I'm NO EXPERT with video manipulation,
although I've been fiddling with ffmpeg for years (in a fairly limited way). Hence, I'm not too flash with all the language folk often use... and how it affects what I'm trying to do in my manipulations... but I'll have a go with this anyway...
I've checked a few links here, for example:
ffmpeg - remove sequentially duplicate frames
...but the content didn't really help me.
I have some hundreds of video clips that have been created under both Windows and Linux using both ffmpeg and other similar applications. However, they have some problems with times in the video where the display is 'motionless'.
As an example, let's say we have some web site that streams a live video into, say, a Flash video player/plugin in a web browser. In this case, we're talking about a traffic camera video stream, for example.
There's an instance of ffmpeg running that is capturing a region of the (Windows) desktop into a video file, viz:-
ffmpeg -hide_banner -y -f dshow ^
-i video="screen-capture-recorder" ^
-vf "setpts=1.00*PTS,crop=448:336:620:360" ^
-an -r 25 -vcodec libx264 -crf 0 -qp 0 ^
-preset ultrafast SAMPLE.flv
Let's say the actual 'display' that is being captured looks like this:-
123456789 XXXXX 1234567 XXXXXXXXXXX 123456789 XXXXXXX
^---a---^ ^-P-^ ^--b--^ ^----Q----^ ^---c---^ ^--R--^
...where each character position represents a (sequence of) frame(s). Owing to a poor internet connection, a "single frame" can be displayed for an extended period (the 'X' characters being an (almost) exact copy of the immediately previous frame). So this means we have segments of the captured video where the image doesn't change at all (to the naked eye, anyway).
How can we deal with the duplicate frames?... and how does our approach change if the 'duplicates' are NOT the same to ffmpeg but LOOK more-or-less the same to the viewer?
If we simply remove the duplicate frames, the 'pacing' of the video is lost, and what used to take, maybe, 5 seconds to display, now takes a fraction of a second, giving a very jerky, unnatural motion, although there are no duplicate images in the video. This seems to be achievable using ffmpeg with an 'mp_decimate' option, viz:-
ffmpeg -i SAMPLE.flv ^ ... (i)
-r 25 ^
-vf mpdecimate,setpts=N/FRAME_RATE/TB DEC_SAMPLE.mp4
That reference I quoted uses a command that shows which frames 'mp_decimate' will remove when it considers them to be 'the same', viz:-
ffmpeg -i SAMPLE.flv ^ ... (ii)
-vf mpdecimate ^
-loglevel debug -f null -
...but knowing that (complicated formatted) information, how can we re-organize the video without executing multiple runs of ffmpeg to extract 'slices' of video for re-combining later?
In that case, I'm guessing we'd have to run something like:-
user specifies a 'threshold duration' for the duplicates
(maybe run for 1 sec only)
determine & save main video information (fps, etc - assuming
constant frame rate)
map the (frame/time where duplicates start)->no. of
frames/duration of duplicates
if the duration of duplicates is less than the user threshold,
don't consider this period as a 'series of duplicate frames'
and move on
extract the 'non-duplicate' video segments (a, b & c in the
diagram above)
create 'new video' (empty) with original video's specs
for each video segment
extract the last frame of the segment
create a short video clip with repeated frames of the frame
just extracted (duration = user spec. = 1 sec)
append (current video segment+short clip) to 'new video'
and repeat
...but in my case, a lot of the captured videos might be 30 minutes long and have hundreds of 10 sec long pauses, so the 'rebuilding' of the videos will take a long time using this method.
This is why I'm hoping there's some "reliable" and "more intelligent" way to use
ffmepg (with/without the 'mp_decimate' filter) to do the 'decimate' function in only a couple of passes or so... Maybe there's a way that the required segments could even be specified (in a text file, for example) and as ffmpeg runs it will
stop/restart it's transcoding at specified times/frame numbers?
Short of this, is there another application (for use on Windows or Linux) that could do what I'm looking for, without having to manually set start/stop points,
extracting/combining video segments manually...?
I've been trying to do all this with ffmpeg N-79824-gcaee88d under Win7-SP1 and (a different version I don't currently remember) under Puppy Linux Slacko 5.6.4.
Thanks a heap for any clues.
I assume what you want to do is to keep frames with motion and upto 1 second of duplicate frames but discard the rest.
ffmpeg -i in.mp4 -vf
"select='if(gt(scene,0.01),st(1,t),lte(t-ld(1),1))',setpts=N/FRAME_RATE/TB"
trimmed.mp4
What the select filter expression does is make use of an if-then-else operator:
gt(scene,0.01) checks if the current frame has detected motion relative to the previous frame. The value will have to be calibrated based on manual observation by seeing which value accurately captures actual activity as compared to sensor/compression noise or visual noise in the frame. See here on how to get a list of all scene change values.
If the frame is evaluated to have motion, the then clause evaluates st(1,t). The function st(val,expr) stores the value of expr in a variable numbered val and it also returns that expression value as its result. So, the timestamp of the kept frames will keep on being updated in that variable until a static frame is encountered.
The else clause checks the difference between the current frame timestamp and the timestamp of the stored value. If the difference is less than 1 second, the frame is kept, else discarded.
The setpts sanitizes the timestamps of all selected frames.
Edit: I tested my command with a video input I synthesized and it worked.
I've done a bit of work on this question... and have found the following works pretty well...
It seems like the input video has to have a "constant frame rate" for things to work properly, so the first command is:-
ffmpeg -i test.mp4 ^
-vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" ^
-vsync cfr test01.mp4
I then need to look at the 'scores' for each frame. Such a listing is produced by:-
ffmpeg -i test01.mp4 ^
-vf select="'gte(scene,0)',metadata=print" -f null -
I'll look at all those scores... and average them (mean) - a bit dodgy but it seems to work Ok. In this example, that average score is '0.021187'.
I then have to select a 'persistence' value -- how long to let the 'duplicated' frames run. If you force it to only keep one frame, the entire video will tend to run much too quickly... So, I've been using 0.2 seconds as a starting point.
So the next command becomes:-
ffmpeg -i test01.mp4 ^
-vf "select='if(gt(scene,0.021187),st(1,t),lte(t-ld(1),0.20))',
setpts=N/FRAME_RATE/TB" output.mp4
After that, the resultant 'output.mp4' video seems to work pretty well. It's only a bit of fiddling with the 'persistence' value that might need to be done to compromise between having a smoother-playing video and scenes that change a bit abruptly.
I've put together some Perl code that works Ok, which I'll work out how to post, if folks are interested in it... eventually(!)
Edit: Another advantage of doing this 'decimating', is that files are of shorter duration (obviously) AND they are smaller in size. For example, a sample video that ran for 00:07:14 and was 22MB in size went to 00:05:35 and 11MB.
Variable frame rate encoding is totally possible, but I don't think it does what you think it does. I am assuming that you wish to remove these duplicate frames to save space/bandwidth? If so, it will not work because the codec is already doing it. Codecs use reference frames, and only encode what has changed from the reference. Hence the duplicate frame take almost no space to begin with. Basically frames are just encoded as a packet of data saying, copy the previous frame, and make this change. The X frames have zero changes, so it only takes a few bytes to encode each one.
I am concatenating multiple (max 25) audio files using SoX with
sox first.mp3 second.mp3 third.mp3 result.mp3
which does what it is supposed to; concatenates given files into one file. But unfortunately there is a small time-gap between those files in result.mp3. Is there a way to remove this gap?
I am creating first.mp3, second.mp3 and so on before concatenating them by merging multiple audios(same length/format/rate):
sox -m drums.mp3 bass.mp3 guitar.mp3 first.mp3
How can I check and assure that there is no time-gap added on all those files? (merged and concatenated)
I need to achieve a seamless playback of all the concatenated files (when playing them in browser one after another it works ok).
Thank you for any help.
EDIT:
The exact example (without real file-names) of a command I am running is now:
sox "|sox -m file1.mp3 file2.mp3 file3.mp3 file4.mp3 -p" "|sox -m file1.mp3 file6.mp3 file7.mp3 -p" "|sox -m file5.mp3 file6.mp3 file4.mp3 -p" "|sox -m file0.mp3 file2.mp3 file9.mp3 -p" "|sox -m file1.mp3 file15.mp3 file4.mp3 -p" result.mp3
This merges files and pipes them directly into concatenation command. The resulting mp3 (result.mp3) has an ever so slight delay between concatenated files. Any ideas really appreciated.
The best — though least helpful — way to do this is not to use MP3 files as your source files. WAV, FLAC or M4A files don't have this problem.
MP3s aren't made up of fixed-rate samples, so cropping out a section of an arbitrary length will not work as you expect. Unless the encoder was smart (like lame), there will often be a gap at the start or end of the MP3 file's audio. I did a test with a sample 0.98s long (which is precisely 73½ CDDA frames, and many MP3 encoders use frames for minimum sample lengths). I then encoded the sample with three different MP3 encoders (lame, sox, and the ancient shine), then decoded those files with three decoders (lame, sox, and madplay). Here's how the sample lengths compare to the original:
Enc.→Dec. Length Samples CDDA Frames
----------------- --------- ------- -----------
shine→lame 0.95" 42095 71.5901
shine→madplay 0.97" 42624 72.4898
shine→sox 0.97" 42624 72.4898
lame→lame 0.98" 43218 73.5000
*Original 0.98" 43218 73.5000
sox→sox 0.99" 43776 74.4490
sox→lame 1.01" 44399 75.5085
lame→madplay 1.02" 44928 76.4082
lame→sox 1.02" 44928 76.4082
sox→madplay 1.02" 44928 76.4082
Only the file encoded and decoded by lame ended up the same length (mostly because lame inserts a length tag to correct for these too-short samples, and knows how to decode it). Everything encoded by sox ended up with a tiny gap, no matter what decoder I used. So joining the files will result in tiny clicks.
Your browser is likely mixing and overlapping the source files very slightly so you don't hear the clicks. Gapless playback is hard to do correctly.
This is my guess for your issue:
sox does not add time gap during concatenation,
however it add time-gap in other operations, for instance if you do a conversion before the concatenation.
To find out what happens I suggest you to check all durations of your files at each time (you can use soxi for instance) to see what's going on.
If it doesn't work (the time-gap is added during concatenation), let me please do another guess:
Sox add time gap because your samples at the beginning or at the end of the file are not close to zero.
To solve this, you could use very short fade-in an fade-out on you files.
Moreover, to force sox to output files with a well-defined length, you could use the trim parameter like this:
sox filein.mp3 trim 0 duration fileout.mp3
First you need really check if the start and the end of your files has no silences, i dont know if sox can do it but you need check the energy(rms, dB) of the start and end audio signals and cut start and end silence, to join audio files without gaps you need apply one window function in your signal to works like a fadein/fadeout and then crossfade the beginning of one with the end of the other.
sox provide a splice function to crossfade:
splice [−h|−t|−q] { position[,excess[,leeway]] }
Splice together audio sections. This effect provides two things over simple audio concatenation: a (usually short) cross-fade is applied at the join, and a wave similarity comparison is made to help determine the best place at which to make the join.
Check Documentation here