FFmpeg remove silence with exact duration detected by detect silence - audio

I have an audio file, that have some silences, which I am detecting with ffmpeg detectsilence and then trying to remove with removesilence, however there is some strange behavior. Specifically:
1) File's Basic info based on ffprobe show_streams
Input #0, mp3, from 'my_file.mp3':
Metadata:
encoder : Lavf58.64.100
Duration: 00:00:25.22, start: 0.046042, bitrate: 32 kb/s
Stream #0:0: Audio: mp3, 24000 Hz, mono, fltp, 32 kb/s
2) Using detectsilence
ffmpeg -i my_file.mp3 -af silencedetect=noise=-50dB:d=0.2 -f null -
I get this result
[mp3float # 000001ee50074280] overread, skip -7 enddists: -1 -1
[silencedetect # 000001ee5008a1c0] silence_start: 6.21417
[silencedetect # 000001ee5008a1c0] silence_end: 6.91712 | silence_duration: 0.702958
[silencedetect # 000001ee5008a1c0] silence_start: 16.44
[silencedetect # 000001ee5008a1c0] silence_end: 17.1547 | silence_duration: 0.714708
[mp3float # 000001ee50074280] overread, skip -10 enddists: -3 -3
[mp3float # 000001ee50074280] overread, skip -5 enddists: -4 -4
[silencedetect # 000001ee5008a1c0] silence_start: 24.4501
size=N/A time=00:00:25.17 bitrate=N/A speed=1.32e+03x
video:0kB audio:1180kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect # 000001ee5008a1c0] silence_end: 25.176 | silence_duration: 0.725917
That also match the values and points based on Adobe Audition
So far all good.
3) Now, based on some calculations (which is based on application's logic on what should be the final duration of the audio) I am trying to delete the silence with "0.725917"s duration. For that, based on ffmpeg docs (https://ffmpeg.org/ffmpeg-filters.html#silencedetect)
Trim all silence encountered from beginning to end where there is more
than 1 second of silence in audio:
silenceremove=stop_periods=-1:stop_duration=1:stop_threshold=-90dB
I run this command
ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72 result1.mp3
So, I am expecting that it should delete only the silence with "0.725917" duration (the last one in the above image), however it is deleting the silence that starts at 16.44s with duration of "0.714708"s. Please see the following comparison:
4) Running detectsilence on result1.mp3 with same options gives even stranger results
ffmpeg -i result1.mp3 -af silencedetect=noise=-50dB:d=0.2 -f null -
result
[mp3float # 0000017723404280] overread, skip -5 enddists: -4 -4
[silencedetect # 0000017723419540] silence_start: 6.21417
[silencedetect # 0000017723419540] silence_end: 6.92462 | silence_duration: 0.710458
[mp3float # 0000017723404280] overread, skip -7 enddists: -6 -6
[mp3float # 0000017723404280] overread, skip -7 enddists: -2 -2
[mp3float # 0000017723404280] overread, skip -6 enddists: -1 -1
Last message repeated 1 times
[silencedetect # 0000017723419540] silence_start: 23.7308
size=N/A time=00:00:24.45 bitrate=N/A speed=1.33e+03x
video:0kB audio:1146kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect # 0000017723419540] silence_end: 24.456 | silence_duration: 0.725167
So, the results are:
With command to remove silences that are longer than "0.72 second", a silence that was "0.714708"s, got removed and - a silence with "0.725917"s remained as is (well, actually changed a little - as per 3rd point)
The first silence that had started at "6.21417" and had a duration of "0.702958"s, suddenly now has a duration of "0.710458"s
The 3rd silence that had started at "24.4501" (which now starts at 23.7308 - obviously because the 2nd silence was removed) and had a duration of "0.725917", now suddenly is "0.725167"s (this one is not a big difference, but still why even removing other silence, this silence's duration should change at all).
Accordingly the expected results are:
Only the silences that match the provided condition (stop_duration=0.72) should be removed. In this specific example only the last one, but in general any silence that matches the condition of the length - irrelevant of their positioning (start, end or in the middle)
Other silences should remain with same exact duration they were before
FFMpeg: 4.2.4-1ubuntu0.1, Ubuntu: 20.04.2
Some attempts and results, while playing with ffmpeg options
a)
ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72:detection=peak tmp1.mp3
result:
First and second silences are removed, 3rd silence's duration remains exactly the same
b)
ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.71 tmp_0.71.mp3
result:
First and second silences are removed, 3rd silence remains, but the duration becomes "0.72075"s
c)
ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.7 tmp_0.7.mp3
result:
all 3 silence are removed
d) the edge case
this command still removes the second silence (after which the first silence become exactly as in point #4 and last silence becomes "0.721375")
ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72335499999 tmp_0.72335499999.mp3
but this one, again does not remove any silence:
ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.723355 tmp_0.723355.mp3
e) window param case 0.03
ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72:window=0.03 window_0.03.mp3
does not remove any silence, but the detect silence
ffmpeg -i window_0.03.mp3 -af silencedetect=noise=-50dB:d=0.2 -f null -
gives this result (compare with silences in result1.mp3 - from point #4 )
[mp3float # 000001c5c8824280] overread, skip -5 enddists: -4 -4
[silencedetect # 000001c5c883a040] silence_start: 6.21417
[silencedetect # 000001c5c883a040] silence_end: 6.92462 | silence_duration: 0.710458
[mp3float # 000001c5c8824280] overread, skip -7 enddists: -6 -6
[mp3float # 000001c5c8824280] overread, skip -7 enddists: -2 -2
[silencedetect # 000001c5c883a040] silence_start: 16.4424
[silencedetect # 000001c5c883a040] silence_end: 17.1555 | silence_duration: 0.713167
[mp3float # 000001c5c8824280] overread, skip -6 enddists: -1 -1
Last message repeated 1 times
[silencedetect # 000001c5c883a040] silence_start: 24.4508
size=N/A time=00:00:25.17 bitrate=N/A speed=1.24e+03x
video:0kB audio:1180kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect # 000001c5c883a040] silence_end: 25.176 | silence_duration: 0.725167
f) window case 0.01
ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72:window=0.01 window_0.01.mp3
removes first and second silences, the detect silence with same params has the following result
[mp3float # 000001ea631d4280] overread, skip -5 enddists: -4 -4
Last message repeated 1 times
[mp3float # 000001ea631d4280] overread, skip -7 enddists: -2 -2
[mp3float # 000001ea631d4280] overread, skip -6 enddists: -1 -1
Last message repeated 1 times
[silencedetect # 000001ea631ea1c0] silence_start: 23.0108
size=N/A time=00:00:23.73 bitrate=N/A speed=1.2e+03x
video:0kB audio:1113kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect # 000001ea631ea1c0] silence_end: 23.736 | silence_duration: 0.725167
Any thoughts, ideas, points are much appreciated.

You're suffering from two things:
You are converting back to an mp3 (a lossy format), which is causing result1.mp3 to be reencoded and become slightly different than a perfect cut. The fix for this is to use .wav's (a lossless format).
The silenceremove function is using a window and you need to set it to 0 to do sample-by-sample.
ffmpeg -i my_file.mp3 my_file.wav
ffmpeg -i my_file.wav -af silencedetect=noise=-50dB:d=0.2 -f null -
ffmpeg -i my_file.wav -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72:window=0 result1.wav
ffmpeg -i result1.wav -af silencedetect=noise=-50dB:d=0.2 -f null -
Final output of the last line. I would consider this a solid solution, because the silence starts and durations match up perfectly with their values before the cut:
[silencedetect # 0x5570a855b400] silence_start: 6.21417
[silencedetect # 0x5570a855b400] silence_end: 6.91712 | silence_duration: 0.702958
[silencedetect # 0x5570a855b400] silence_start: 16.44
[silencedetect # 0x5570a855b400] silence_end: 17.1547 | silence_duration: 0.714708
size=N/A time=00:00:24.45 bitrate=N/A speed=4.49e+03x
You can then reencode it to .mp3 if you want.

Related

FFMpeg merge video and audio at specific time into another video

I have a standard mp4 (audio + video)
I am trying to merge a 1.4 second mini mp4 clip into this track, replacing the video for the length of the mini clip but merging the audios together at a specific time
Would anyone know how to do this using ffmpeg?
I've tried quite a few different filters, however can't seem to get what I want
V <------->
miniclip.mp4 A <=======>
V <-----------> ↓ + ↓ <--->
standard.mp4 A <=========================>
Example to show miniclip.mp4 (1.4 seconds long) at timestamp 5.
ffmpeg -i main.mp4 -i miniclip.mp4 -filter_complex "[0:v]drawbox=t=fill:enable='between(t,5,6.4)'[bg];[1:v]setpts=PTS+5/TB[fg];[bg][fg]overlay=x=(W-w)/2:y=(H-h)/2:eof_action=pass;[1:a]adelay=5s:all=1[a1];[0:a][a1]amix" output.mp4
drawbox covers the main video with black. Only needed if miniclip.mp4 has a smaller width or height than main.mp4. You can omit it if miniclip.mp4 width and height is ≥ than main.mp4. Alternatively you could use the scale2ref filter to make miniclip.mp4 have the same width and height as main.mp4.
setpts add a 5 second offset to miniclip.mp4 video.
overlay overlays miniclip.mp4 video over main.mp4 video.
adelay adds a 5 second delay to miniclip.mp4 audio.
amix mixes miniclip.mp4 and main.mp4 audio.
More info
See FFmpeg Filter Documentation for info on each filter.
How to get video duration
Edited (now I understood the question):
First Get 1.4 seconds of standard.mp4 and audio1.mp3
-ss is the start for get the small video that will be 1.4 seconds of length (with -t option you can specify the duration, in this case 1.4 seconds) summary: cut video from min 5, 1.4 seconds
-an is for audio none copy, because you want to add a new audio1.mp3
video_only.mp4
ffmpeg -ss 00:05:00 -i standard.mp4 -t 1.4 -map 0:v -c copy -an small_only_video.mp4
audio_only.mp4
ffmpeg -ss 00:05:00 -i audio1.mp3 -t 1.4 -c copy small_only_audio.mp3
now you can to create a small_clip_audiovideo.mp4
ffmpeg -i small_only_video.mp4 -c:a mp3 -i small_only_audio.mp3 -c copy -map 0:v -map 1:a:0 -disposition:a:0 default -disposition:a:1 default -strict -2 -sn -dn -map_metadata -1 -map_chapters -1 -movflags faststart small_clip_audiovideo.mp4
V <------->
miniclip.mp4 A <=======>
V <-----------> ↓ + ↓ <------->
standard.mp4 A <=============================>
|--|--|--|--|--|--|--|--|--|--|
0 1 2 3 4 5 6 7 8 9 10
standard.mp4 have 10 seconds (aprox) of duration, have audio and video
miniclip.mp4 have 03 seconds (aprox) of duration, have video and audio
ffmpeg -i standard.mp4 |
} have same codes of video and audio?*
ffmpeg -i miniclip.mp4 |
if are not a same audio code or video code of files standard.mp4 and miniclip.mp4, you will be to recode for continue, if you want a good work.
ffmpeg -ss 00:00:00 -i standard.mp4 -t 4 -c copy 01.part_project.mp4
and 7 to 10, in 03.part_project.mp4
ffmpeg -ss 00:00:04.000 -i standard.mp4 -t 3.0000 -c copy 03.part_project.mp4
changue name or create a copy of miniclip.mp4 to 02.part_project.mp4
cp miniclip.mp4 02.part_project.mp4
(the part of 4 second to 7 seconds, of standard.mp4 will be used if you choice the OPTION 2 copy only the audio, santadard_part2_audio.mp4)
NOW THE OPTION N 1: IS TO CONTACT (UNITED) the 3 video parts
make a folder "option1" and copy 01.part_project.mp4 02.part_project.mp4 03.part_project.mp4
mkdir option1 && cp 01.part_project.mp4 02.part_project.mp4 03.part_project.mp4 ./option1 && cd ./option1
now you concat 01.part_project.mp4 + 02.part_project.mp4 + 03.part_project.mp4 into a unique file fin_option1.mp4
ffmpeg -f concat -safe 0 -i <(for f in ./*.mp4; do echo "file '$PWD/$f'"; done) -c copy fin_option1.mp4
V <------->
miniclip.mp4 A <=======>
V <-----------> ↓ + ↓ <------->
standard.mp4 A <============XXXXXXXXX========>
|--|--|--|--|--|--|--|--|--|--|
0 1 2 3 4 5 6 7 8 9 10
THE SECOND OPTION IS TO CONTACT (UNITED) the 3 video parts, BUT MIX
THE AUDIO OF miniclip.mp4 with santadard_part2_audio.mp4
get the audio stream from santadard_part2_audio.mp4 and get the audio
file only from miniclip.mp4
ffmpeg -i santadard_part2_audio.mp4 -map 0:a -c copy -vn -strict -2 mix_audio_santadad.mp4
ffmpeg -i miniclip.mp4 -map 0:a -c copy -vn -strict -2 mix_audio_miniclip.mp4
MIX ALL AUDIOS** IN ONE AND PUT THE VIDEO FROM miniclip.mp4
ffmpeg -i mix_audio_miniclip.mp4 -i mix_audio_santadad.mp4 -filter_complex amix=inputs=2:duration=longest -strict -2 audio_mixed_miniclip.mp4
get only video from miniclip.mp4
ffmpeg -i miniclip.mp4 -c copy -an miniclip_video.mp4
and get miniclip but with mixed audios, I think that it is the solution that you are looking for
ffmpeg -i miniclip_video.mp4 -i audio_mixed_miniclip.mp4 -c copy -map 0:v -map 1:a:0 -disposition:a:0 default -disposition:a:1 default -strict -2 -sn -dn -map_metadata -1 -map_chapters -1 -movflags faststart 02.part_project_OPTION2.mp4
santadard_part2_audio.mp4
+
audio_miniclip.mp4
V <------->
miniclip.mp4 A <MMMMMMMM> (audio miniclip mixed with standard.mp4)
V <-----------> ↓ + ↓ <------->
standard.mp4 A <============ ========>
|--|--|--|--|--|--|--|--|--|--|
0 1 2 3 4 5 6 7 8 9 10
make a folder "option2" and copy 01.part_project.mp4 02.part_project_OPTION2.mp4 03.part_project.mp4
mkdir option2 && cp 01.part_project.mp4 02.part_project_OPTION2.mp4 03.part_project.mp4 ./option2 && cd ./option2
ffmpeg -f concat -safe 0 -i <(for f in ./*.mp4; do echo "file '$PWD/$f'"; done) -c copy fin_option2.mp4
NOTES
** YOU CAN USE A LOT OF AUDIO MANIPULATIONS https://trac.ffmpeg.org/wiki/AudioChannelManipulation

FFMpeg ZeroMQ Filter stops working after a short while

I run FFMpeg as follows:
#!/bin/bash
fc="[1]scale=iw/2:ih/2 [pip]; [pip] zmq=bind_address=tcp\\\://127.0.0.1\\\:1235,[0]overlay=x=0:y=0"
ffmpeg -v verbose -re -y -i test.mkv -i test2.mkv -filter_complex "$fc" -f mpegts -codec:v libx264 -preset ultrafast resultzmq.mp4
I then start a Python 3 app to send zmq commands to FFMpeg:
import zmq
import time
import sys
from multiprocessing import Process
context = zmq.Context()
port = "1235"
print("Connecting to server with port {}".format(port))
socket = context.socket(zmq.REQ)
socket.connect("tcp://localhost:{}".format(port))
for request in range (20):
print("Sending request ", request, "...")
socket.send_string("Parsed_overlay_2 x 200")
message = socket.recv()
print("Received reply ", request, "[", message, "]")
time.sleep (1)
Which runs fine up until about 40 seconds when I get this from Ffmpeg (it stops getting the command):
frame= 918 fps= 24 q=19.0 size= 12192kB time=00:00:38.82 bitrate=2572.6kbits
frame= 931 fps= 24 q=19.0 size= 12402kB time=00:00:39.30 bitrate=2585.1kbits
[Parsed_zmq_1 # 0x56185e089220] Processing command #8 target:Parsed_overlay_2 command:x arg:200
[Parsed_zmq_1 # 0x56185e089220] Sending command reply for command #8:
0 Success
frame= 938 fps= 24 q=19.0 size= 12516kB time=00:00:39.82 bitrate=2574.1kbits/frame= 952 fps= 24 q=19.0 size= 12752kB time=00:00:40.33 bitrate=2590.0kbits/[Parsed_zmq_1 # 0x56185e089220] Processing command #9 target:Parsed_overlay_2 command:x arg:200
[Parsed_zmq_1 # 0x56185e089220] Sending command reply for command #9:
0 Success
frame= 963 fps= 24 q=19.0 size= 12932kB time=00:00:40.81 bitrate=2595.6kbits
frame= 976 fps= 24 q=19.0 size= 13121kB time=00:00:41.31 bitrate=2601.4kbits
frame= 992 fps= 24 q=19.0 size= 13434kB time=00:00:41.84 bitrate=2629.9kbits
frame= 1002 fps= 24 q=18.0 size= 13582kB time=00:00:42.34 bitrate=2627.2kbits
and this from the Python 3 client:
Sending request 8 ...
Received reply 8 [ b'0 Success' ]
Sending request 9 ...
Received reply 9 [ b'0 Success' ]
Sending request 10 ...
The disconnect always happens at the same time, no matter when I start the Python client. If I start it after 40 seconds, it won't send any commands at all.
On my actual application, the same thing happens but at about 60 seconds.
I tried setting up a simple Python server/client and the problem does not occur. So I assume the problem must have something to do with FFMpeg and its zmq plugin?
If you would like to test this yourself, just make sure test.mkv and test2.mkv is some video longer than 1 minute.
I would really appreciate any assistance!
After aimlessly changing the code for the better part of the day, I finally found the solution:
#!/bin/bash
fc="[1]scale=iw/2:ih/2,[0]overlay=x=0:y=0,zmq=bind_address=tcp\\\://127.0.0.1\\\:1235 "
ffmpeg -v verbose -re -y -i test.mkv -i server_upgrade_2.mkv -filter_complex "$fc" -f mpegts -codec:v libx264 -preset ultrafast resultzmq.mp4
My guess is that even though the position of the zmq filter does not matter when you try to issue commands (you can issue commands to all the filters), when the input to the zmq filter ends, so does the zmq filter.
Using REQ/REP archetype in any seriously meant, production-grade distributed system is indeed a
Highway to Hell
Never opt in for a trivially looking false beauty of REQ/REP. Never. It can and will fall into an unsalvagable mutual deadlock. The matter is not if, but just when.
I have found not any explicit reason if / why FFMPEG used REP for any particular reason / if it can start to use any other, more suitable archetype as PAIR / PAIR for pipeline-filter-internode-processing or PUSH/PULL or some advanced, composite signalling/messaging layer compositions. Again, my other posts here on ZeroMQ bring more reasoning and examples.

Piping pi's opencv video to ffmpeg for Youtube streaming

This is a small python3 script reading off picam using OpenCV :
#picamStream.py
import sys, os
from picamera.array import PiRGBArray
from picamera import PiCamera
import time
import cv2
# initialize the camera and grab a reference to the raw camera capture
camera = PiCamera()
camera.resolution = (960, 540)
camera.framerate = 30
rawCapture = PiRGBArray(camera, size=(960, 540))
# allow the camera to warmup
time.sleep(0.1)
# capture frames from the camera
for frame in camera.capture_continuous(rawCapture, format="bgr", use_video_port=True):
image = frame.array
# ---------------------------------
# .
# Opencv image processing goes here
# .
# ---------------------------------
os.write(1, image.tostring())
# clear the stream in preparation for the next frame
rawCapture.truncate(0)
# end
And I am trying to pipe it to ffmpeg to Youtube stream
My understanding is that I need to reference below two commands to somehow come up with a new ffmpeg command.
Piping picam live video to ffmpeg for Youtube streaming.
raspivid -o - -t 0 -vf -hf -w 960 -h 540 -fps 25 -b 1000000 | ffmpeg -re -ar 44100 -ac 2 -acodec pcm_s16le -f s16le -ac 2 -i /dev/zero -f h264 -i - -vcodec copy -acodec aac -ab 128k -g 50 -strict experimental -f flv rtmp://a.rtmp.youtube.com/live2/[STREAMKEY]
Piping OPENCV raw video to ffmpeg for mp4 file.
python3 picamStream.py | ffmpeg -f rawvideo -pixel_format bgr24 -video_size 960x540 -framerate 30 -i - foo.mp4
So far I've had no luck. Can anyone help me with this?
This is the program I use in raspberry pi.
#main.py
import subprocess
import cv2
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
command = ['ffmpeg',
'-f', 'rawvideo',
'-pix_fmt', 'bgr24',
'-s','640x480',
'-i','-',
'-ar', '44100',
'-ac', '2',
'-acodec', 'pcm_s16le',
'-f', 's16le',
'-ac', '2',
'-i','/dev/zero',
'-acodec','aac',
'-ab','128k',
'-strict','experimental',
'-vcodec','h264',
'-pix_fmt','yuv420p',
'-g', '50',
'-vb','1000k',
'-profile:v', 'baseline',
'-preset', 'ultrafast',
'-r', '30',
'-f', 'flv',
'rtmp://a.rtmp.youtube.com/live2/[STREAMKEY]']
pipe = subprocess.Popen(command, stdin=subprocess.PIPE)
while True:
_, frame = cap.read()
pipe.stdin.write(frame.tostring())
pipe.kill()
cap.release()
Youtube needs an audio source, so use -i /dev/zero.
I hope it helps you.

ffmpeg not resize video with dvd format

I have a video and an audio file. I'm trying joining them and slice a piece of video and it's working:
ffmpeg -ss 0:0:1.950 -i "video.avi" -ss 0:0:1.950 -i "audio.mp3" -target pal-dvd -bufsize 9175040 -muxrate 50400000 -acodec ac3 -ac 2 -ab 128k -ar 44100 -t 0:0:5.997 -y "output.mpg"
The problem is when I try resize the video using the -vf filter, example:
ffmpeg -ss 0:0:1.950 -i "video.avi" -ss 0:0:1.950 -i "audio.mp3" -vf scale="1024:420" -target pal-dvd -bufsize 9175040 -muxrate 50400000 -acodec ac3 -ac 2 -ab 128k -ar 44100 -t 0:0:5.997 -y "output.mpg"
It doesn't work because of the argument: -target pal-dvd. If I remove this argument, the video resize but doesn't keep the quality I want.
-target pal-dvd is equal to -c:v mpeg2video -c:a ac3 -f dvd -s 720x576 -r 25 -pix_fmt yuv420p -g 15 -b:v 6000000 -maxrate:v 9000000 -minrate:v 0 -bufsize:v 1835008 -packetsize 2048 -muxrate 10080000 -b:a 448000 -ar 48000. Your other options override these defaults, so you can simply use these options directly and remove the -s 720x576 and use your own size instead.
I'm not sure why you want to resize to 1024x420 and then use -target pal-dvd, but this option implies additional options. From ffmpeg_opt.c:
} else if (!strcmp(arg, "dvd")) {
opt_video_codec(o, "c:v", "mpeg2video");
opt_audio_codec(o, "c:a", "ac3");
parse_option(o, "f", "dvd", options);
parse_option(o, "s", norm == PAL ? "720x576" : "720x480", options);
parse_option(o, "r", frame_rates[norm], options);
parse_option(o, "pix_fmt", "yuv420p", options);
opt_default(NULL, "g", norm == PAL ? "15" : "18");
opt_default(NULL, "b:v", "6000000");
opt_default(NULL, "maxrate:v", "9000000");
opt_default(NULL, "minrate:v", "0"); // 1500000;
opt_default(NULL, "bufsize:v", "1835008"); // 224*1024*8;
opt_default(NULL, "packetsize", "2048"); // from www.mpucoder.com: DVD sectors contain 2048 bytes of data, this is also the size of one pack.
opt_default(NULL, "muxrate", "10080000"); // from mplex project: data_rate = 1260000. mux_rate = data_rate * 8
opt_default(NULL, "b:a", "448000");
parse_option(o, "ar", "48000", options);
Also, option placement matters. If you want to resize and use -target then place the filtering after -target. Note that this will probably resize twice.
Or omit -target and manually declare each option and modify them to your desired specifications.

How to split video or audio by silent parts

I need to automatically split video of a speech by words, so every word is a separate video file. Do you know any ways to do this?
My plan was to detect silent parts and use them as words separators. But i didn't find any tool to do this and looks like ffmpeg is not the right tool for that.
You could first use ffmpeg to detect intervals of silence, like this
ffmpeg -i "input.mov" -af silencedetect=noise=-30dB:d=0.5 -f null - 2> vol.txt
This will produce console output with readings that look like this:
[silencedetect # 00000000004b02c0] silence_start: -0.0306667
[silencedetect # 00000000004b02c0] silence_end: 1.42767 | silence_duration: 1.45833
[silencedetect # 00000000004b02c0] silence_start: 2.21583
[silencedetect # 00000000004b02c0] silence_end: 2.7585 | silence_duration: 0.542667
[silencedetect # 00000000004b02c0] silence_start: 3.1315
[silencedetect # 00000000004b02c0] silence_end: 5.21833 | silence_duration: 2.08683
[silencedetect # 00000000004b02c0] silence_start: 5.3895
[silencedetect # 00000000004b02c0] silence_end: 7.84883 | silence_duration: 2.45933
[silencedetect # 00000000004b02c0] silence_start: 8.05117
[silencedetect # 00000000004b02c0] silence_end: 10.0953 | silence_duration: 2.04417
[silencedetect # 00000000004b02c0] silence_start: 10.4798
[silencedetect # 00000000004b02c0] silence_end: 12.4387 | silence_duration: 1.95883
[silencedetect # 00000000004b02c0] silence_start: 12.6837
[silencedetect # 00000000004b02c0] silence_end: 14.5572 | silence_duration: 1.8735
[silencedetect # 00000000004b02c0] silence_start: 14.9843
[silencedetect # 00000000004b02c0] silence_end: 16.5165 | silence_duration: 1.53217
You then generate commands to split from each silence end to the next silence start. You will probably want to add some handles of, say, 250 ms, so the audio will have a duration of 250 ms * 2 more.
ffmpeg -ss <silence_end - 0.25> -t <next_silence_start - silence_end + 2 * 0.25> -i input.mov word-N.mov
(I have skipped specifying audio/video parameters)
You'll want to write a script to scrape the console log and generate a structured (maybe CSV) file with the timecodes - one pair on each line: silence_end and the next silence_start. And then another script to generate the commands with each pair of numbers.

Resources