Gstreamer: RTP jitter buffer not working properly with packet loss? - audio

For a VoIP speech quality monitoring application I need to compare an incoming RTP audio stream to a reference signal. For the signal comparison itself I use pre-existing, special-purpose tools. For the other parts (except packet capture) the Gstreamer library seemed to be a good choice. I use the following pipeline to simulate a bare-bones VoIP client:
filesrc location=foobar.pcap ! pcapparse ! "application/x-rtp, payload=0, clock-rate=8000"
! gstrtpjitterbuffer ! rtppcmudepay ! mulawdec ! audioconvert
! audioresample ! wavenc ! filesink location=foobar.wav
The pcap file contains a single RTP media stream. I crafted a capture file that's missing 50 of the original 400 UDP datagrams. For the given audio sample (8s long for my example):
[XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX]
with a certain amount of consecutive packet loss I'd expect an audio signal like this to be output ('-' denotes silence):
[XXXXXXXXXXXXXXXXXXXXXXXX-----XXXXXXXXXXX]
however what is actually saved in the audio file is this (1s shorter for my example):
[XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX]
It seems that the jitter buffer, a crucial part for this application, is not working properly. Could this be an incompatibility with / a shortcoming of the pcapparse element? Am I missing a key part in the pipeline to ensure time synchronization? What else could be causing this?

The issue could be solved by slightly altering the pipeline:
An audiorate element needs to be added before wavenc that "produces a perfect stream by inserting or dropping samples as needed".
However this works only if audiorate receives the packet-loss events. For this the do-lost property of gstjitterbuffer needs to be set to true.
Here's the corrected pipeline:
filesrc location=foobar.pcap ! pcapparse
! "application/x-rtp, payload=0, clock-rate=8000"
! gstrtpjitterbuffer do-lost=true ! rtppcmudepay ! mulawdec
! audioconvert ! audioresample ! audiorate ! wavenc
! filesink location=foobar.wav

GStreamer may just use the dejitter buffer to smooth out the packets on the way to the (audio) output. This wouldn't be unusual, its the bare minimum definition of dejittering.
It may go so far as reordering out-of-order packets or deleting duplicates, but packet loss concealment (your scenario) can be quite complex.
Basic implementations just duplicate the last received packet, whilst more advanced implementation analyse and reconstruct the tone of the last received packets to smooth out the audio.
It sounds like your application performance will depend on the exact implementation of loss concealment, so even if GStreamer does do "something", you may have a hard time quantifying its impact on your results unless you understand it in great detail.
Perhaps you could try a pcap with a couple of out-of-order and duplicate packets and check if GStreamer at least reorders/deletes them, that would go someway to clarifying what is happening.

Related

Streaming audio to 2N equipment using GStreamer

I work for the IT department of my school and I am currently re-doing the IP audio system that is used for announcements or the school bell. My school uses devices from the company "2N" for this. The audio transmission is done via multicast.
Until now, all devices used the same multicast address, which made it impossible to address only one building. So, in order to be able to address the buildings independently, I use different multicast addresses for each building.
For example:
Building A: 224.0.0.5:8910
Building B: 224.0.0.6:8910
Until now, one device was the "master" sending the school chime via the single multicast address. Unfortunately, the device is not able to address more than four multicast addresses simultaneously. Therefore I thought I could use GStreamer for this.
However, I do not manage to address the 2N devices via GStreamer. The devices are configured to a specific multicast address as shown in the example above and all use UDP port 8910. As codec I am offered PCMU, PCMA, L16 / 16kHz, G.729, G.722 in the interface. Unfortunately, the manufacturer does not publish more information.
I currently use the codec "L16 / 16kHz", because it provides the best audio quality. I can also send sounds over it via the master, but not with GStreamer.
I have tried these commands without success:
# Using "L16 / 16kHz"
gst-launch-1.0.exe -v audiotestsrc ! audioconvert ! rtpL16pay ! udpsink host=224.0.0.8 port=8910
# Using PCMU
gst-launch-1.0.exe -v filesrc location=Pausengong.wav ! decodebin ! audioconvert ! audioresample ! mulawenc ! rtppcmupay ! udpsink host=224.0.0.8 port=8910
Another complicating factor is that the 2N devices are completely silent when they receive "false data" - no crackling or noise.
I'm sorry I can't contribute more information. To be honest, I'm a bit out of my depth and can't work well with the small amount of information and the insufficient support from the manufacturer. Maybe someone here has a bright idea!
Thanks in advance!
With kind regards
Linus
I still can't believe but I solved my problem! By using Wireshark to analyse the packets sent by the original 2N master device, I discovered important parameters that have to be passed to GStreamer.
When using PCMU (G.711u) codec:
gst-launch-1.0.exe -vv filesrc location="test.wav" ! decodebin ! audioconvert ! audioresample ! mulawenc ! rtppcmupay mtu=172 ! udpsink host=224.0.0.8 port=8910 multicast-iface=192.168.52.151
If your sender device (in my case a windows PC) has multiple network interfaces, make sure that you set the multicast-iface to the IP-address of your device. When not specified the data might be sent to the wrong network interface and will never actually reach the target device.
The next thing to specify was the mtu (maximum transfer unit). As far as I could analyse with Wireshark the 2N devices require an MTU of 172 when using PCMU (G.711u).
When using L16/16kHz codec:
gst-launch-1.0.exe -vv filesrc location="test.wav" ! decodebin ! audioconvert ! audioresample ! rtpL16pay mtu=652 pt=98 ! udpsink host=224.0.0.8 port=8910 multicast-iface=192.168.52.151
Again, make sure that you have your multicast-iface set correctly. Besides that, the L16/16kHz codec in 2N devices requires an MTU of 652 and pt=98 (payload type).
I hope that this information might help others and I'm glad that the problem is solved now. The necessary values for 2N devices were reverse-engineered using Wireshark. Helpful documentation for GStreamer: https://thiblahute.github.io/GStreamer-doc/rtp-1.0/rtpL16pay.html

How to fix image problems when streaming h.264 via gstreamer udpsink

Using gstreamer I want to stream images from several Logitech C920 webcams to a Janus media server in RTP/h.264 format. The webcams produce h.264 encoded video streams, so I can send the streams to a UDP sink without re-encoding data, only payloading it.
I'm using the gst-interpipe plugin to switch between the different webcams, so that the video stream received by Janus stays the same, but with images coming from whatever webcam I choose.
It works but I'm experiencing some problems with broken frames where the colors are gray and details are blurred away, mainly the first 5 - 10 seconds after I switch between webcam source streams. After that the images correct themselves.
First frames
After 5 - 10 seconds or more
First I thought it was a gst-interpipe specific problem, but I can reproduce it by simply setting up two pipelines - one sending a video stream to a UDP sink and one reading from a UDP source:
gst-launch-1.0 -v -e v4l2src device=/dev/video0 ! queue ! video/x-
h264,width=1280,height=720,framerate=30/1 ! rtph264pay
config-interval=1 ! udpsink host=127.0.0.1 port=8004
gst-launch-1.0 -v udpsrc port=8004 caps = "application/x-rtp,
media=video, clock-rate=90000, encoding-name=H264, payload=96" !
rtph264depay ! decodebin ! videoconvert ! xvimagesink
NB: I'm not experiencing this problem if I send the video stream directly to an xvimagesink, i.e. when not using UDP streaming.
Am I missing some important parameters in my pipelines? Is this a buffering issue? I really have no idea how to correct this.
Any help is greatly appreciated.
Due to the nature of temporal dependencies of video streams you cannot just tune in into stream and expect it to be decode-able immediately. Correct decoding can only start at Random-Access-Point frames (e.g. I- or IDR-frames). before that you will get image data that rely on video frames you haven't received - so they will look broken. Some decoders offers some control on what to do on these cases. libavdec_h264 for example has a output-corrupt option. (But actually I don't how it behaves for "correct" frames which just are missing reference frames). Or they may have options to skip everything until a RAP-frame occurs. This depends on your specific decoder implementation. Note however that on any of these options the initial delay before you will see any image will increase.

Mix multiple audio streams into one playback-sound using Gstreamer

I want to use Gstreamer to receive audio streams from multiple points on the same port.
Indeed I want to stream audio from different nodes on the network to one device that listen to incoming audio streams, and it should mix multiple audios before playback.
I know that I should use audiomixer or liveadder to do such a task.
But I can't do it, and the mixer doesn't act correctly and when two audio streams came, the output sound would be so noisy and corrupted.
I used the following command :
gst-launch-1.0.exe -v udpsrc port=5001 caps="application/x-rtp" !
queue ! rtppcmudepay ! mulawdec ! audiomixer name=mix mix. !
audioconvert ! audioresample ! autoaudiosink
but it doesn't work.
Packets on a same port couldn't demux from each other as normal way you wrote in your command, to receive multiple audio streams from same port you should use SSRC and rtpssrcdemux demux.
However to receive multiple audio streams on multiple ports and mix them, you could use liveadder element. An example to receive two audio streams from two ports and mix them is as follows:
gst-launch-1.0 -v udpsrc name=src5001 caps="application/x-rtp"
port=5001 ! rtppcmudepay ! mulawdec ! audioresample ! liveadder
name=m_adder ! alsasink device=hw:0,0 udpsrc name=src5002
caps="application/x-rtp" port=5002 ! rtppcmudepay ! mulawdec !
audioresample ! m_adder.
First, you probably want to use audiomixer over liveadder as the first guarantees synchronization of the different audio streams.
Then, about your mixing problem, you mention that the output sound is "noisy and corrupted", which makes me think of problem with audio levels. Though audiomixer clips the output audio to the maximum allowed amplitude range, it can result in audio artefacts if your sources are too loud. Thus, you might want to play with the volume property on both sources. See here and there for more information.

Sync audio and video when playing mp4 file with GStreamer

I need to sync video and audio when I play mp4 file. How can I do that?
Here's my pipeline:
gst-launch-0.10 filesrc location=./big_buck_bunny.mp4 ! \
qtdemux name=demux demux.video_00 ! queue ! TIViddec2 engineName=codecServer codecName=h264dec ! ffmpegcolorspace !tidisplaysink2 video-standard=pal display-output=composite \
demux.audio_00 ! queue max-size-buffers=500 max-size-time=0 max-size-bytes=0 ! TIAuddec1 ! audioconvert ! audioresample ! autoaudiosink
Have you tried playing the video on a regular desktop without using TI's elements? GStreamer should take care of synchronization for playback cases (and many others).
If the video is perfectly synchronized on a desktop then you have a bug on the elements specific to your target platform (TIViddec2 and tidisplaysink2). qtdemux should already put the expected timestamps on the buffers, so it is possible that TIViddec2 isn't copying those to its decoded buffers or tidisplaysink2 isn't respecting them. (The same might apply to the audio part)
I'd first check TIViddec2 by replacing the rest of the pipeline after it with a fakesink and run with verbose mode of gst-launch. The output from fakesink should show you the output timestamps, check if those are consistent, you can also put a fakesink right after qtdemux to check the timestamps that it produces and see if the decoders are respecting that.
I used wrong video framerate actually.

Syncing audio and video when mp4muxing in gst-launch-1.0

I have a Logitech C920 webcam that provides properly formatted h264 video, and a mic hooked up to an ASUS Xonar external USB sound card. I can read both and mux their data into a single file like this:
gst-launch-1.0 -e \
mp4mux name=muxy ! filesink location=/tmp/out.mp4 \
alsasrc device='hw:Device,0' do-timestamp=true ! audio/x-raw,rate=48000 ! audioconvert ! queue ! lamemp3enc ! muxy.audio_0 \
v4l2src do-timestamp=true ! video/x-h264,framerate=30/1,height=720 ! h264parse ! queue ! muxy.video_0
...but then I get poorly synchronized audio/video. The audio flow consistently starts with 250ms of garbage noise, and the resulting mp4 video is 250ms (7 or 8 frames at 30fps) out of sync.
Seems like the sources start simultaneously, but the sound card inserts 250ms of initialization junk every time. Or perhaps, the camera takes 250ms longer to start up but reports an incorrect start of stream flag. Or, maybe the clocks in my devices are out of sync for some reason. I don't know how to figure out the difference between these (and other) potential root causes.
Whatever the cause, I'd like to patch over the symptoms at least. I've been trying to do any of the following in the gstreamer pipeline, any of which would satisfy my requirements:
Cut out the first 250ms of audio
Delay the video by 250ms or 7 frames
Synchronize the audio and video timestamps properly with attributes like alsasrc slave-method or v4l2src io-mode
And I'm apparently doing it wrong. Nothing works. No matter what, I always end up with the video running 250ms/7 frames ahead of the audio. Adding the queue elements reportedly fixed the sync issue as mediainfo now reports Duration values for audio and wideo within 20ms of each other, which would be acceptable. But that's not how the resulting videos actually work. Clap my hands, the noise arrives late.
This can be fixed in post processing but why not avoid the hassle and get it right, straight from the gst pipeline? I'm all out of tricks and just about ready to fall back to fixing every single video's sync by hand instead. Any ideas out there?
Thanks for any help, tips, ideas.

Resources