open h.264 video stream with gpu - python-3.x

I decode h.264 on Jetson Nano using Open-cv.
I use this Code:
import cv2
try:
cap = cv2.VideoCapture('udp://234.0.0.0:46002', cv2.CAP_FFMPEG)
print(f"cap = {cap}")
except Exception as e:
print(f"Error: {e}")
if not cap.isOpened():
print('VideoCapture not opened')
exit(-1)
while True:
ret, frame = cap.read()
# print(f"frame = {frame}")
try:
cv2.imshow('Image', frame)
except Exception as e:
print(e)
if cv2.waitKey(1) & 0XFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
everything works fine.
now I won't try to optimize my code by decoding using GPU my question is how can I do this?
I see this option:
cap = cv2.VideoCapture('filesrc location=sample2.mp4 ! qtdemux ! queue ! h264parse ! omxh264dec ! nvvidconv ! video/x-raw,format=BGRx ! queue ! videoconvert ! queue ! video/x-raw, format=BGR ! appsink', cv2.CAP_GSTREAMER)
but my source is URL.
I would be happy to any help how to decode h.264 from URL in python using GPU.

I use the FFmpeg command on my computer to get information about the video and I get this plot:
ffmpeg -hide_banner -loglevel debug -i udp://127.0.0.0:46002 -f xv display
Splitting the commandline.
Reading option '-hide_banner' ... matched as option 'hide_banner' (do not
show program banner) with argument '1'.
Reading option '-loglevel' ... matched as option 'loglevel' (set logging
level) with argument 'debug'.
Reading option '-i' ... matched as input url with argument
'udp://127.0.0.0:46002'.
Reading option '-f' ... matched as option 'f' (force format) with argument
'xv'.
Reading option 'display' ... matched as output url.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option hide_banner (do not show program banner) with argument 1.
Applying option loglevel (set logging level) with argument debug.
Successfully parsed a group of options.
Parsing a group of options: input url udp://127.0.0.0:46002.
Successfully parsed a group of options.
Opening an input file: udp://127.0.0.0:46002.
[NULL # 0000020a7c5ded80] Opening 'udp://127.0.0.0:46002' for reading
[udp # 0000020a7c5cb700] No default whitelist set
[udp # 0000020a7c5cb700] end receive buffer size reported is 393216
[h264 # 0000020a7c5ded80] Format h264 probed with size=32768 and score=51
[h264 # 0000020a7c5ded80] Before avformat_find_stream_info() pos: 0 bytes
read:33339 seeks:0 nb_streams:1
[h264 # 0000020a7c631340] non-existing PPS 0 referenced
[extract_extradata # 0000020a7c60eec0] nal_unit_type: 1(Coded slice of a
non-IDR picture), nal_ref_idc: 2
Last message repeated 1 times
[h264 # 0000020a7c631340] nal_unit_type: 1(Coded slice of a non-IDR
picture), nal_ref_idc: 2
Last message repeated 1 times
[h264 # 0000020a7c631340] non-existing PPS 0 referenced
[h264 # 0000020a7c631340] decode_slice_header error
[h264 # 0000020a7c631340] non-existing PPS 0 referenced
[h264 # 0000020a7c631340] decode_slice_header error
[h264 # 0000020a7c631340] no frame!
[h264 # 0000020a7c631340] non-existing PPS 0 referenced
[extract_extradata # 0000020a7c60eec0] nal_unit_type: 1(Coded slice of a
non-IDR picture), nal_ref_idc: 2
Last message repeated 1 times
[h264 # 0000020a7c631340] nal_unit_type: 1(Coded slice of a non-IDR
picture), nal_ref_idc: 2
Last message repeated 1 times
[h264 # 0000020a7c631340] non-existing PPS 0 referenced
[h264 # 0000020a7c631340] decode_slice_header error
[h264 # 0000020a7c631340] non-existing PPS 0 referenced
[h264 # 0000020a7c631340] decode_slice_header error
[h264 # 0000020a7c631340] no frame!
[h264 # 0000020a7c631340] non-existing PPS 0 referenced
[extract_extradata # 0000020a7c60eec0] nal_unit_type: 1(Coded slice of a
non-IDR picture), nal_ref_idc: 2
Last message repeated 1 times
[h264 # 0000020a7c631340] nal_unit_type: 1(Coded slice of a non-IDR
picture), nal_ref_idc: 2
Last message repeated 1 times
[h264 # 0000020a7c631340] non-existing PPS 0 referenced
[h264 # 0000020a7c631340] decode_slice_header error
[h264 # 0000020a7c631340] non-existing PPS 0 referenced
[h264 # 0000020a7c631340] decode_slice_header error
[h264 # 0000020a7c631340] no frame!
[h264 # 0000020a7c631340] non-existing PPS 0 referenced
[extract_extradata # 0000020a7c60eec0] nal_unit_type: 1(Coded slice of a
non-IDR picture), nal_ref_idc: 2
Last message repeated 1 times
[h264 # 0000020a7c631340] nal_unit_type: 1(Coded slice of a non-IDR
picture), nal_ref_idc: 2
Last message repeated 1 times
[h264 # 0000020a7c631340] non-existing PPS 0 referenced
[h264 # 0000020a7c631340] decode_slice_header error
[h264 # 0000020a7c631340] non-existing PPS 0 referenced
[h264 # 0000020a7c631340] decode_slice_header error
[h264 # 0000020a7c631340] no frame!
[extract_extradata # 0000020a7c60eec0] nal_unit_type: 7(SPS), nal_ref_idc:3
[extract_extradata # 0000020a7c60eec0] nal_unit_type: 8(PPS), nal_ref_idc:3
[extract_extradata # 0000020a7c60eec0] nal_unit_type: 5(IDR), nal_ref_idc:3
Last message repeated 1 times
[h264 # 0000020a7c631340] nal_unit_type: 7(SPS), nal_ref_idc: 3
[h264 # 0000020a7c631340] nal_unit_type: 8(PPS), nal_ref_idc: 3
[h264 # 0000020a7c631340] nal_unit_type: 5(IDR), nal_ref_idc: 3
Last message repeated 1 times
[h264 # 0000020a7c631340] Format yuv420p chosen by get_format().
[h264 # 0000020a7c631340] Reinit context to 720x576, pix_fmt: yuv420p
[h264 # 0000020a7c631340] nal_unit_type: 1(Coded slice of a non-IDR
picture), nal_ref_idc: 2
Last message repeated 11 times
[h264 # 0000020a7c5ded80] max_analyze_duration 5000000 reached at 5000000
microseconds st:0
[h264 # 0000020a7c5ded80] After avformat_find_stream_info() pos: 971047
bytes read:971495 seeks:0 frames:128
Input #0, h264, from 'udp://127.0.0.0:46002':
Duration: N/A, bitrate: N/A
Stream #0:0, 128, 1/1200000: Video: h264 (Constrained Baseline), 1
reference frame, yuv420p(progressive, left), 720x576, 0/1, 25 fps, 25 tbr,
1200k tbn, 50 tbc
Successfully opened the file.
Parsing a group of options: output url display.
Applying option f (force format) with argument xv.
Successfully parsed a group of options.
Opening an output file: display.
[NULL # 0000020a7ce73000] Requested output format 'xv' is not a suitable
output format
display: Invalid argument
[AVIOContext # 0000020a7c610300] Statistics: 971495 bytes read, 0 seeks

You would use uridecodebin that can decode various types of urls, containers, protocols and codecs.
With Jetson, the decoder selected by uridecodebin for h264 would be nvv4l2decoder, that doesn't use GPU but better dedicated HW decoder NVDEC.
nvv4l2decoder outputs into NVMM memory in NV12 format, while opencv appsink expects BGR format in system memory. So you would use HW converter nvvidconv for converting and copying into system memory. Unfortunately, nvvidconv doesn't support BGR format, so first convert into supported BGRx format with nvvidconv, and finally use CPU plugin videoconvert for BGRx -> BGR conversion such as:
pipeline='uridecodebin uri=rtsp://127.0.0.1:8554/test ! nvvidconv ! video/x-raw,format=BGRx ! videoconvert ! video/x-raw,format=BGR ! appsink drop=1'
cap = cv2.VideoCapture(pipeline, cv2.CAP_GSTREAMER)
This is for the general way.
Though, for some streaming protocols it may not be so simple.
For RTP-H264/UDP, ffmpeg backend may only work with a SDP file.
For gstreamer backend you would instead use a pipeline such as:
pipeline='udpsrc port=46002 multicast-group=234.0.0.0 ! application/x-rtp,encoding-name=H264 ! rtpjitterbuffer latency=500 ! rtph264depay ! h264parse ! nvv4l2decoder ! nvvidconv ! video/x-raw,format=BGRx ! videoconvert ! video/x-raw,format=BGR ! appsink drop=1'
As you can use FFMPEG, I'd speculate that received stream is using RTP-MP2T. So you would try:
# Using NVDEC, but this may fail depending on sender side's codec:
cap = cv2.VideoCapture('udpsrc multicast-group=234.0.0.0 port=46002 ! application/x-rtp,media=video,encoding-name=MP2T,clock-rate=90000,payload=33 ! rtpjitterbuffer latency=300 ! rtpmp2tdepay ! tsdemux ! h264parse ! nvv4l2decoder ! nvvidconv ! video/x-raw,format=BGRx ! videoconvert ! video/x-raw,format=BGR ! appsink drop=1', cv2.CAP_GSTREAMER)
# Or using CPU (may not support high pixel rate with Nano):
cap = cv2.VideoCapture('udpsrc multicast-group=234.0.0.0 port=46002 ! application/x-rtp,media=video,encoding-name=MP2T,clock-rate=90000,payload=33 ! rtpjitterbuffer latency=300 ! rtpmp2tdepay ! tsdemux ! h264parse ! avdec_h264 ! videoconvert ! video/x-raw,format=BGR ! appsink drop=1', cv2.CAP_GSTREAMER)
[Note that I'm not familiar with 234.0.0.0, so unsure if multicast-group should be used as I did].
If this doesn't work, you may try to get more information about received stream. You may try working ffmpeg such as:
ffmpeg -hide_banner -loglevel debug -i udp://234.0.0.0:46002 -f xv display
If you see:
Stream #0:0, 133, 1/1200000: Video: h264 (Constrained Baseline), 1 reference frame, yuv420p(progressive, left), 720x576, 0/1, 25 fps, 25 tbr, 1200k tbn, 50 tbc
you may have to change clock-rate to 1200000 (default value is 90000):
application/x-rtp,media=video,encoding-name=MP2T,clock-rate=1200000
This is assuming the stream is mpeg2 ts. In this case, first lines show:
...
Opening an input file: udp://127.0.0.1:5002.
[NULL # 0x55761c4690] Opening 'udp://127.0.0.1:5002' for reading
[udp # 0x55761a27c0] No default whitelist set
[udp # 0x55761a27c0] end receive buffer size reported is 131072
[mpegts # 0x55761c4690] Format mpegts probed with size=2048 and score=47
[mpegts # 0x55761c4690] stream=0 stream_type=1b pid=41 prog_reg_desc=HDMV
[mpegts # 0x55761c4690] Before avformat_find_stream_info() pos: 0 bytes read:26560 seeks:0 nb_streams:1
...
ffmpeg tries to guess and here found the stream was in mpegts format. You would check in your case what ffmpeg finds. Note that first guess may not be correct, you would have to check the whole log and see what it finds working.
Another speculation would be that your stream is not RTP, but rather raw h264 stream. In such case you may be able to decode with something like:
gst-launch-1.0 udpsrc port=46002 multicast-group=234.0.0.0 ! h264parse ! nvv4l2decoder ! autovideosink
If this works, for opencv you would use:
pipeline='udpsrc port=46002 multicast-group=234.0.0.0 ! h264parse ! nvv4l2decoder ! nvvidconv ! video/x-raw,format=BGRx ! videoconvert ! video/x-raw,format=BGR ! appsink drop=1'

Related

Gstreamer: Could not capture whole duration audio with pipeline

Have tried to capture audio with video together in a dynamic pipeline but it seems like audio could only capture for fragment of time (not the full length in the file). I have used gst-discoverer to find out this problem
File Tested
gst-discoverer-1.0 xxxx.mp4
Analyzing file:///xxxx.mp4
====== AIUR: 4.6.1 build on May 11 2021 03:19:55. ======
Core: AVI_PARSER_03.06.08 build on Sep 15 2020 02:45:45
file: /usr/lib/imx-mm/parser/lib_avi_parser_arm_elinux.so.3.1
------------------------
Track 00 [video_0] Enabled
Duration: 0:04:41.160000000
Language: und
Mime:
video/x-h264, parsed=(boolean)true, alignment=(string)au, stream-format=(string)byte-stream, width=(int)720, height=(int)480, framerate=(fraction)25/1
------------------------
====== VPUDEC: 4.6.1 build on May 11 2021 03:19:55. ======
wrapper: 3.0.0 (VPUWRAPPER_ARM64_LINUX Build on Jun 3 2021 04:20:32)
vpulib: 1.1.1
firmware: 1.1.1.65535
------------------------
Track 01 [audio_0] Enabled
Duration: 0:00:12.750476000
Language: und
Mime:
audio/x-raw, format=(string)S16LE, channels=(int)2, layout=(string)interleaved, rate=(int)44100, bitrate=(int)1411200
------------------------
Done discovering file:///xxxx.mp4
Properties:
Duration: 0:04:41.160000000
Seekable: yes
Live: no
container: Audio Video Interleave (AVI)
audio: Raw 16-bit PCM audio
Stream ID: b6e7e8a9768c340295f0f67833e05ab2e8fe2243b4d7ec4e5d6152cbe76dc8af/1
Language: <unknown>
Channels: 2 (front-left, front-right)
Sample rate: 44100
Depth: 16
Bitrate: 0
Max bitrate: 0
video: H.264 (Constrained Baseline Profile)
Stream ID: b6e7e8a9768c340295f0f67833e05ab2e8fe2243b4d7ec4e5d6152cbe76dc8af/0
Width: 720
Height: 480
Depth: 24
Frame rate: 25/1
Pixel aspect ratio: 1/1
Interlaced: false
Bitrate: 0
Max bitrate: 0
From the description given by gst-discoverer, it seem that the audio was only recording for 12+ seconds. I then constructed the static pipeline and test it out with gst-launch
gst-launch-1.0 -e -v \
v4l2src \
! video/x-raw,width=720,height=480,framerate=30/1,is-live=true \
! clockoverlay \
! videorate \
! video/x-raw,framerate=25/1 \
! tee name=tv \
tv. \
! queue name=q1a \
! vpuenc_h264 \
! h264parse \
! mux.video_0 \
tv. \
! queue name=q1b \
! vpuenc_h264 \
! tee name=tv2 \
tv2. \
! queue \
! rtph264pay pt=96 \
! udpsink host="x.x.x.x" port=3456 \
pulsesrc volume=8.0 \
! audioconvert \
! audioresample \
! volume volume=1.0 \
! audio/x-raw,rate=8000,channels=1,depth=8,format=S16LE \
! tee name=ta \
! queue \
! alawenc \
! tee name=ta2 \
ta2. \
! queue \
! rtppcmapay pt=8 \
! udpsink host="x.x.x.x" port=3458 \
ta2. \
! queue \
! mux.audio_0 \
avimux name=mux \
! queue \
! filesink location=file%02d.avi
the audiofile recorded
Analyzing file:///home/root/file%2502d.avi
====== AIUR: 4.6.1 build on May 11 2021 03:19:55. ======
Core: AVI_PARSER_03.06.08 build on Sep 15 2020 02:45:45
file: /usr/lib/imx-mm/parser/lib_avi_parser_arm_elinux.so.3.1
------------------------
Track 00 [video_0] Enabled
Duration: 0:00:40.520000000
Language: und
Mime:
video/x-h264, parsed=(boolean)true, alignment=(string)au, stream-format=(string)byte-stream, width=(int)720, height=(int)480, framerate=(fraction)25/1
------------------------
====== VPUDEC: 4.6.1 build on May 11 2021 03:19:55. ======
wrapper: 3.0.0 (VPUWRAPPER_ARM64_LINUX Build on Jun 3 2021 04:20:32)
vpulib: 1.1.1
firmware: 1.1.1.65535
Track 01 [audio]: Disabled
Codec: 2, SubCodec: 0
------------------------
Done discovering file:///home/root/file%2502d.avi
Properties:
Duration: 0:00:40.520000000
Seekable: yes
Live: no
container: Audio Video Interleave (AVI)
video: H.264 (Constrained Baseline Profile)
Stream ID: 2f44a8a002c570424bca50bdc0bc9c743ea882e7cd3f855918368cd108ff977f/0
Width: 720
Height: 480
Depth: 24
Frame rate: 25/1
Pixel aspect ratio: 1/1
Interlaced: false
Bitrate: 0
Max bitrate: 0
It seem to me that there are no audio recorded but on opening the file, there seem to be some static noise being heard(i assume that audio is perhaps recorded)
So my question is
For the static pipeline, is there really audio recorded? gst-discoverer and actual opening of files seem to show different outcomes
For dynamic pipeline (coded in C), what could the problem be that I could look into. Would be grateful if anyone can, with similar prior experience, point out area I could look into
Thanks

FFmpeg remove silence with exact duration detected by detect silence

I have an audio file, that have some silences, which I am detecting with ffmpeg detectsilence and then trying to remove with removesilence, however there is some strange behavior. Specifically:
1) File's Basic info based on ffprobe show_streams
Input #0, mp3, from 'my_file.mp3':
Metadata:
encoder : Lavf58.64.100
Duration: 00:00:25.22, start: 0.046042, bitrate: 32 kb/s
Stream #0:0: Audio: mp3, 24000 Hz, mono, fltp, 32 kb/s
2) Using detectsilence
ffmpeg -i my_file.mp3 -af silencedetect=noise=-50dB:d=0.2 -f null -
I get this result
[mp3float # 000001ee50074280] overread, skip -7 enddists: -1 -1
[silencedetect # 000001ee5008a1c0] silence_start: 6.21417
[silencedetect # 000001ee5008a1c0] silence_end: 6.91712 | silence_duration: 0.702958
[silencedetect # 000001ee5008a1c0] silence_start: 16.44
[silencedetect # 000001ee5008a1c0] silence_end: 17.1547 | silence_duration: 0.714708
[mp3float # 000001ee50074280] overread, skip -10 enddists: -3 -3
[mp3float # 000001ee50074280] overread, skip -5 enddists: -4 -4
[silencedetect # 000001ee5008a1c0] silence_start: 24.4501
size=N/A time=00:00:25.17 bitrate=N/A speed=1.32e+03x
video:0kB audio:1180kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect # 000001ee5008a1c0] silence_end: 25.176 | silence_duration: 0.725917
That also match the values and points based on Adobe Audition
So far all good.
3) Now, based on some calculations (which is based on application's logic on what should be the final duration of the audio) I am trying to delete the silence with "0.725917"s duration. For that, based on ffmpeg docs (https://ffmpeg.org/ffmpeg-filters.html#silencedetect)
Trim all silence encountered from beginning to end where there is more
than 1 second of silence in audio:
silenceremove=stop_periods=-1:stop_duration=1:stop_threshold=-90dB
I run this command
ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72 result1.mp3
So, I am expecting that it should delete only the silence with "0.725917" duration (the last one in the above image), however it is deleting the silence that starts at 16.44s with duration of "0.714708"s. Please see the following comparison:
4) Running detectsilence on result1.mp3 with same options gives even stranger results
ffmpeg -i result1.mp3 -af silencedetect=noise=-50dB:d=0.2 -f null -
result
[mp3float # 0000017723404280] overread, skip -5 enddists: -4 -4
[silencedetect # 0000017723419540] silence_start: 6.21417
[silencedetect # 0000017723419540] silence_end: 6.92462 | silence_duration: 0.710458
[mp3float # 0000017723404280] overread, skip -7 enddists: -6 -6
[mp3float # 0000017723404280] overread, skip -7 enddists: -2 -2
[mp3float # 0000017723404280] overread, skip -6 enddists: -1 -1
Last message repeated 1 times
[silencedetect # 0000017723419540] silence_start: 23.7308
size=N/A time=00:00:24.45 bitrate=N/A speed=1.33e+03x
video:0kB audio:1146kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect # 0000017723419540] silence_end: 24.456 | silence_duration: 0.725167
So, the results are:
With command to remove silences that are longer than "0.72 second", a silence that was "0.714708"s, got removed and - a silence with "0.725917"s remained as is (well, actually changed a little - as per 3rd point)
The first silence that had started at "6.21417" and had a duration of "0.702958"s, suddenly now has a duration of "0.710458"s
The 3rd silence that had started at "24.4501" (which now starts at 23.7308 - obviously because the 2nd silence was removed) and had a duration of "0.725917", now suddenly is "0.725167"s (this one is not a big difference, but still why even removing other silence, this silence's duration should change at all).
Accordingly the expected results are:
Only the silences that match the provided condition (stop_duration=0.72) should be removed. In this specific example only the last one, but in general any silence that matches the condition of the length - irrelevant of their positioning (start, end or in the middle)
Other silences should remain with same exact duration they were before
FFMpeg: 4.2.4-1ubuntu0.1, Ubuntu: 20.04.2
Some attempts and results, while playing with ffmpeg options
a)
ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72:detection=peak tmp1.mp3
result:
First and second silences are removed, 3rd silence's duration remains exactly the same
b)
ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.71 tmp_0.71.mp3
result:
First and second silences are removed, 3rd silence remains, but the duration becomes "0.72075"s
c)
ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.7 tmp_0.7.mp3
result:
all 3 silence are removed
d) the edge case
this command still removes the second silence (after which the first silence become exactly as in point #4 and last silence becomes "0.721375")
ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72335499999 tmp_0.72335499999.mp3
but this one, again does not remove any silence:
ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.723355 tmp_0.723355.mp3
e) window param case 0.03
ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72:window=0.03 window_0.03.mp3
does not remove any silence, but the detect silence
ffmpeg -i window_0.03.mp3 -af silencedetect=noise=-50dB:d=0.2 -f null -
gives this result (compare with silences in result1.mp3 - from point #4 )
[mp3float # 000001c5c8824280] overread, skip -5 enddists: -4 -4
[silencedetect # 000001c5c883a040] silence_start: 6.21417
[silencedetect # 000001c5c883a040] silence_end: 6.92462 | silence_duration: 0.710458
[mp3float # 000001c5c8824280] overread, skip -7 enddists: -6 -6
[mp3float # 000001c5c8824280] overread, skip -7 enddists: -2 -2
[silencedetect # 000001c5c883a040] silence_start: 16.4424
[silencedetect # 000001c5c883a040] silence_end: 17.1555 | silence_duration: 0.713167
[mp3float # 000001c5c8824280] overread, skip -6 enddists: -1 -1
Last message repeated 1 times
[silencedetect # 000001c5c883a040] silence_start: 24.4508
size=N/A time=00:00:25.17 bitrate=N/A speed=1.24e+03x
video:0kB audio:1180kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect # 000001c5c883a040] silence_end: 25.176 | silence_duration: 0.725167
f) window case 0.01
ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72:window=0.01 window_0.01.mp3
removes first and second silences, the detect silence with same params has the following result
[mp3float # 000001ea631d4280] overread, skip -5 enddists: -4 -4
Last message repeated 1 times
[mp3float # 000001ea631d4280] overread, skip -7 enddists: -2 -2
[mp3float # 000001ea631d4280] overread, skip -6 enddists: -1 -1
Last message repeated 1 times
[silencedetect # 000001ea631ea1c0] silence_start: 23.0108
size=N/A time=00:00:23.73 bitrate=N/A speed=1.2e+03x
video:0kB audio:1113kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect # 000001ea631ea1c0] silence_end: 23.736 | silence_duration: 0.725167
Any thoughts, ideas, points are much appreciated.
You're suffering from two things:
You are converting back to an mp3 (a lossy format), which is causing result1.mp3 to be reencoded and become slightly different than a perfect cut. The fix for this is to use .wav's (a lossless format).
The silenceremove function is using a window and you need to set it to 0 to do sample-by-sample.
ffmpeg -i my_file.mp3 my_file.wav
ffmpeg -i my_file.wav -af silencedetect=noise=-50dB:d=0.2 -f null -
ffmpeg -i my_file.wav -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72:window=0 result1.wav
ffmpeg -i result1.wav -af silencedetect=noise=-50dB:d=0.2 -f null -
Final output of the last line. I would consider this a solid solution, because the silence starts and durations match up perfectly with their values before the cut:
[silencedetect # 0x5570a855b400] silence_start: 6.21417
[silencedetect # 0x5570a855b400] silence_end: 6.91712 | silence_duration: 0.702958
[silencedetect # 0x5570a855b400] silence_start: 16.44
[silencedetect # 0x5570a855b400] silence_end: 17.1547 | silence_duration: 0.714708
size=N/A time=00:00:24.45 bitrate=N/A speed=4.49e+03x
You can then reencode it to .mp3 if you want.

FFMpeg ZeroMQ Filter stops working after a short while

I run FFMpeg as follows:
#!/bin/bash
fc="[1]scale=iw/2:ih/2 [pip]; [pip] zmq=bind_address=tcp\\\://127.0.0.1\\\:1235,[0]overlay=x=0:y=0"
ffmpeg -v verbose -re -y -i test.mkv -i test2.mkv -filter_complex "$fc" -f mpegts -codec:v libx264 -preset ultrafast resultzmq.mp4
I then start a Python 3 app to send zmq commands to FFMpeg:
import zmq
import time
import sys
from multiprocessing import Process
context = zmq.Context()
port = "1235"
print("Connecting to server with port {}".format(port))
socket = context.socket(zmq.REQ)
socket.connect("tcp://localhost:{}".format(port))
for request in range (20):
print("Sending request ", request, "...")
socket.send_string("Parsed_overlay_2 x 200")
message = socket.recv()
print("Received reply ", request, "[", message, "]")
time.sleep (1)
Which runs fine up until about 40 seconds when I get this from Ffmpeg (it stops getting the command):
frame= 918 fps= 24 q=19.0 size= 12192kB time=00:00:38.82 bitrate=2572.6kbits
frame= 931 fps= 24 q=19.0 size= 12402kB time=00:00:39.30 bitrate=2585.1kbits
[Parsed_zmq_1 # 0x56185e089220] Processing command #8 target:Parsed_overlay_2 command:x arg:200
[Parsed_zmq_1 # 0x56185e089220] Sending command reply for command #8:
0 Success
frame= 938 fps= 24 q=19.0 size= 12516kB time=00:00:39.82 bitrate=2574.1kbits/frame= 952 fps= 24 q=19.0 size= 12752kB time=00:00:40.33 bitrate=2590.0kbits/[Parsed_zmq_1 # 0x56185e089220] Processing command #9 target:Parsed_overlay_2 command:x arg:200
[Parsed_zmq_1 # 0x56185e089220] Sending command reply for command #9:
0 Success
frame= 963 fps= 24 q=19.0 size= 12932kB time=00:00:40.81 bitrate=2595.6kbits
frame= 976 fps= 24 q=19.0 size= 13121kB time=00:00:41.31 bitrate=2601.4kbits
frame= 992 fps= 24 q=19.0 size= 13434kB time=00:00:41.84 bitrate=2629.9kbits
frame= 1002 fps= 24 q=18.0 size= 13582kB time=00:00:42.34 bitrate=2627.2kbits
and this from the Python 3 client:
Sending request 8 ...
Received reply 8 [ b'0 Success' ]
Sending request 9 ...
Received reply 9 [ b'0 Success' ]
Sending request 10 ...
The disconnect always happens at the same time, no matter when I start the Python client. If I start it after 40 seconds, it won't send any commands at all.
On my actual application, the same thing happens but at about 60 seconds.
I tried setting up a simple Python server/client and the problem does not occur. So I assume the problem must have something to do with FFMpeg and its zmq plugin?
If you would like to test this yourself, just make sure test.mkv and test2.mkv is some video longer than 1 minute.
I would really appreciate any assistance!
After aimlessly changing the code for the better part of the day, I finally found the solution:
#!/bin/bash
fc="[1]scale=iw/2:ih/2,[0]overlay=x=0:y=0,zmq=bind_address=tcp\\\://127.0.0.1\\\:1235 "
ffmpeg -v verbose -re -y -i test.mkv -i server_upgrade_2.mkv -filter_complex "$fc" -f mpegts -codec:v libx264 -preset ultrafast resultzmq.mp4
My guess is that even though the position of the zmq filter does not matter when you try to issue commands (you can issue commands to all the filters), when the input to the zmq filter ends, so does the zmq filter.
Using REQ/REP archetype in any seriously meant, production-grade distributed system is indeed a
Highway to Hell
Never opt in for a trivially looking false beauty of REQ/REP. Never. It can and will fall into an unsalvagable mutual deadlock. The matter is not if, but just when.
I have found not any explicit reason if / why FFMPEG used REP for any particular reason / if it can start to use any other, more suitable archetype as PAIR / PAIR for pipeline-filter-internode-processing or PUSH/PULL or some advanced, composite signalling/messaging layer compositions. Again, my other posts here on ZeroMQ bring more reasoning and examples.

Piping pi's opencv video to ffmpeg for Youtube streaming

This is a small python3 script reading off picam using OpenCV :
#picamStream.py
import sys, os
from picamera.array import PiRGBArray
from picamera import PiCamera
import time
import cv2
# initialize the camera and grab a reference to the raw camera capture
camera = PiCamera()
camera.resolution = (960, 540)
camera.framerate = 30
rawCapture = PiRGBArray(camera, size=(960, 540))
# allow the camera to warmup
time.sleep(0.1)
# capture frames from the camera
for frame in camera.capture_continuous(rawCapture, format="bgr", use_video_port=True):
image = frame.array
# ---------------------------------
# .
# Opencv image processing goes here
# .
# ---------------------------------
os.write(1, image.tostring())
# clear the stream in preparation for the next frame
rawCapture.truncate(0)
# end
And I am trying to pipe it to ffmpeg to Youtube stream
My understanding is that I need to reference below two commands to somehow come up with a new ffmpeg command.
Piping picam live video to ffmpeg for Youtube streaming.
raspivid -o - -t 0 -vf -hf -w 960 -h 540 -fps 25 -b 1000000 | ffmpeg -re -ar 44100 -ac 2 -acodec pcm_s16le -f s16le -ac 2 -i /dev/zero -f h264 -i - -vcodec copy -acodec aac -ab 128k -g 50 -strict experimental -f flv rtmp://a.rtmp.youtube.com/live2/[STREAMKEY]
Piping OPENCV raw video to ffmpeg for mp4 file.
python3 picamStream.py | ffmpeg -f rawvideo -pixel_format bgr24 -video_size 960x540 -framerate 30 -i - foo.mp4
So far I've had no luck. Can anyone help me with this?
This is the program I use in raspberry pi.
#main.py
import subprocess
import cv2
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
command = ['ffmpeg',
'-f', 'rawvideo',
'-pix_fmt', 'bgr24',
'-s','640x480',
'-i','-',
'-ar', '44100',
'-ac', '2',
'-acodec', 'pcm_s16le',
'-f', 's16le',
'-ac', '2',
'-i','/dev/zero',
'-acodec','aac',
'-ab','128k',
'-strict','experimental',
'-vcodec','h264',
'-pix_fmt','yuv420p',
'-g', '50',
'-vb','1000k',
'-profile:v', 'baseline',
'-preset', 'ultrafast',
'-r', '30',
'-f', 'flv',
'rtmp://a.rtmp.youtube.com/live2/[STREAMKEY]']
pipe = subprocess.Popen(command, stdin=subprocess.PIPE)
while True:
_, frame = cap.read()
pipe.stdin.write(frame.tostring())
pipe.kill()
cap.release()
Youtube needs an audio source, so use -i /dev/zero.
I hope it helps you.

RTP Streaming raw AAC

I have raw AAC frames (variable size) and I want to stream them over RTP. I have their ADTS header.
header sample :
0xff 0xf9 0x5c 0x60 0x7 0x40 0x00
According to this the format is :
mpeg-2 (strange because I tell encoder to output mpeg-4) /
no crc /
AAC LC /
22050 Hz /
mono channel /
1 AAC frame
I tried to add this header (part 3) and this sdp :
v=0
o=- 0 0 IN IP6 ::1
s=No Name
c=IN IP6 ::1
t=0 0
a=tool:libavformat 55.7.100
m=audio 6000 RTP/AVP 14
a=rtpmap:14 MPA/22050/1
Without any success, ffmpeg keep telling "header missing", any help would be appreciated

Resources