Freeswitch PJSIP client, PESQ score incorrect - voip

I am trying to measure voice-quality (PESQ Score) by playing a file at server and recoding it at client-side and then passing both files to ITU implementation of PESQ score computation.
While the codec negotiated is PCMU(G.711) so I should get PESQ score of 4.1 (or something near that) but the score I am getting is in range of 3.4-3.7.
After computing score for more than 40 recordings PESQ Scores I am getting is:
AVG: 3.4278035714
MAX: 3.707
MIN: 3.343
I don't understand what is it that I am doing incorrectly. Why am I not getting scores close to 4.1 ??
Setup:
File (PCM#8000) --- FreeSwitch(PCMU#8000) ---PJSIPClient(PCMU#8000)---File(PCM#8000)
Server and client are on same server.
Server: Freeswitch
Client: PJSUA Python client
Audio file format (#Server): RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz
Audio file format (#cilent, recorded by client): RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 8000 Hz
Codec negotiated between client and server is PCMU#8000h
t=0 0
m=audio 28734 RTP/AVP 0 96
a=rtpmap:0 PCMU/8000
a=rtpmap:96 telephone-event/8000
a=fmtp:96 0-16
a=ptime:20
a=rtcp:28735 IN IP4 192.168.1.236

Related

How to solve Jiiter Buffer problem in receiving audio RTP stream (bad sound quality) in PJSIP?

I'm a newbie to pjsip and want to build an RTP stream receiver using pjsip.
Setup:
I want to use specific L16/16000/1 codec and have also enabled it in "config_site.h" during compiling the pjsip project and checked that its available
Receiver:
BeagleBone
CrossCompiled Pjsip and Installed all req. libs and sample apps
Sender:
Another Windows PC in the same Network using FFmpeg to transmit Audio Stream via Multicast
I got to know about streamutil.c(pjsip sample-apps) which does similar things to send and receive both. Now for the sake of easyness, I'm using the same Cross-Compiled binary streamutil.
SENDER:
..\ffmpeg -re -stream_loop -1 -i test.mp3 -ar 16000 -acodec pcm_s16be -b:a 128k -ac 1 -payload_type 123 -f rtp udp://239.255.255.211:5500?pkt_size=652
......
Output #0, rtp, to 'udp://239.255.255.211:5500?pkt_size=652':
Metadata:
title : -----
artist : --------
album : -------
date : 2019
track : 1
encoder : Lavf58.20.100
Stream #0:0: Audio: pcm_s16be, 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc58.35.100 pcm_s16be
SDP:
v=0
o=- 0 0 IN IP4 127.0.0.1
s=GREATEST HITS (2) [1 HOUR 20 MINUTES LONG]
c=IN IP4 239.255.255.211/5
t=0 0
a=tool:libavformat 58.26.101
m=audio 5500 RTP/AVP 123
b=AS:256
a=rtpmap:97 L16/16000/1
a=rtpmap:123 L16/16000/1
a=control:streamid=
size= 833kB time=00:00:25.91 bitrate= 263.4kbits/s speed= 1x
RECIEVER LOG:
./streamutil --mcast-addr=239.255.255.211 --recv-only --codec=L16/16000/1
...
...
17:05:05.178 strm0x55dee1537f48 Jitter buffer starts returning normal frames (after 1 empty/lost)
17:05:05.246 strm0x55dee1537f48 Jitter buffer empty (prefetch=0), plc invoked
17:05:05.266 strm0x55dee1537f48 Jitter buffer starts returning normal frames (after 1 empty/lost)
17:05:05.325 strm0x55dee1537f48 Jitter buffer empty (prefetch=0), plc invoked
17:05:05.344 strm0x55dee1537f48 Jitter buffer starts returning normal frames (after 1 empty/lost)
17:05:05.422 strm0x55dee1537f48 Jitter buffer empty (prefetch=0), plc invoked
Tried So far:
set different payload_type
set specific codec in streamutil as parameter
all other parameters in FFmpeg ex. bitrate, clockrate, channels
Check working stream
I am facing no issue, if I use a *.sdp file to receive RTP stream in VLC.
SDP file:
v=0
o=- 0 0 IN IP4 127.0.0.1
s=GREATEST HITS (2) [1 HOUR 20 MINUTES LONG]
c=IN IP4 239.255.255.211/5
t=0 0
a=tool:libavformat 58.26.101
m=audio 5500 RTP/AVP 123
b=AS:256
a=rtpmap:97 PCMU/8000/1
a=rtpmap:123 PCMU/8000/1
a=control:streamid=
I have googled a lot but stuck now at this problem.
So finally my question is that,
How can I get the same Output via Pjsip without this Jitter Buffer logging and dropped sound?
Any help would be greatly appreciated.!
Please check audio bitrate, clock frequency that is set on the ffmpeg side.
Make sure the timestamp of each packets are updated according to the clock frequency.( check the wireshark logs)
You are trying to do a VOD ( from mp3 file) and not live data transmission, may cause problem. please check the timestamp of the rtp packets.

Converting m4a to highest quality wav and 320kbps mp3

I have a file .m4a audio file of size 805 kb which I wanted to convert it to wav. My purpose is to (1) get highest quality .wav audio file and (2) get 320 kbps .mp3 audio file.
I used an online service (https://www.online-convert.com) to convert it. When I converted directly without optional settings, the file size increased to 11.6 mb and when I converted the same audio with optional settings like, changing bit resolution to 32 Bit and changing sampling rate to 96000 Hz, the file size jumped to 50.7 mb
There are only two optional settings in the web service
Bit resolution - no change, 8 bit, 16 bit, 24 bit, 32 bit
Sampling rate - no change, 1000 hz, 8000, 11025, 16000, 22050, 24000, 32000, 44100, 48000, 96000 hz
And one raido button for Normalize audio that can be checked and unchecked
Can someone explain why the file size increases and what settings I must keep to get the highest quality from the original 805 kb audio?
Thanks

SDP file for PCM audio stream?

I am multicasting PCM audio from mpd through pulseaudio to the network and receive it with VLC. This works fine as long as the audio is 44.1kHz - every other sampling rate results in VLC complaining it needs a correct SDP file describing the stream.
Is there a way to either
a) save the current "settings" in a running VLC session to a SDP file (to later edit it) or
b) create an SDP file by hand, which has these settings included:
rtp://#239.0.0.100:27028
24000Hz sampling rate
PCM Audio, 16bit
Thanks for any help or pointers!
This is what PulseAudio outputs with a debug level of 4 and works as the content of test.sdp:
v=0
o=root 3619500147 0 IN IP4 10.12.65.99
s=PulseAudio RTP Stream on RAMPv6
c=IN IP4 239.0.0.100
t=3619500147 0
a=recvonly
m=audio 27028 RTP/AVP 98
a=rtpmap:98 L16/24000/2
a=type:broadcast

AAC RTP timestamps and synchronization

I am currently streaming audio (AAC-HBR at 8kHz) and video (H264) using RTP. Both feeds works fine individually, but when put together they get out of sync pretty fast (lass than 15 sec).
I am not sure how to increment the time stamp on the audio RTP header, I thought it should be the time difference between two RTP packets (around 127ms) or a constant increment of 1/8000 (0.125 ms). But neither worked, instead I managed to find a sweet spot. When I increment the time stamp by 935 for each packet It stays synchronized for about a minute.
AAC frame size is 1024 samples. Try to increment by (1/8000) * 1024 = 128 ms. Or a multiple of that in case your packet has multiple AAC frames.
Does that help?
Bit late, but thought of putting up my answer.
Timestamp on Audio RTP packet == the number of audio samples contained in RTP packet.
For AAC, each frame consist of 1024 samples, so timestamp on RTP packet should increase by 1024.
Difference between the clocktime of 2 RTP packets = (1/8000)*1024 = 128ms, i.e sender should send the rtp packets with difference of 128 ms.
Bit more information from other sampling rates:
Now AAC sampled at 44100hz means 44100 sample of signal in 1 sec.
So 1024 samples means (1000ms/44100)*1024 = 23.21995 ms
So the timestamp between 2 RTP packets = 1024, but
The difference of clock time between 2 RTP packets in rtp session should be 23.21995ms.
Trying to correlate with other example:
For example for G711 family (PCM, PCMU, PCMA), The sampling frequency = 8k.
So the 20ms packet should have samples == 8000/50 == 160.
And hence RTP timestamps are incremented by 160.
The difference of clock time between 2 RTP packets should be 20ms.
IMHO video and audio de-sync in android is difficult to fight if they are taken from different media recorders. They just capture different start frames and there is no way (as it seems) to find out how big de-sync is and adjust it with audio or video timestamps on flight.

How to calculate effective time offset in RTP

I have to calculate time offset between packets in RTP streams. With video stream encoded with Theora codec i have timestamp field like
2856000
2940000
3024000
...
So I assume that transmission offset is 84000. With audio speex codec i have timestamp field like
38080
38400
38720
...
So I assume that transmission offset is 320. Why values so different? Are they microseconds, milliseconds, or what? Can i generalize a formula to calculate delay between packets in microseconds that works with any codec? Thank you.
RTP timestamps are media dependant. They use the sampling rate of the codec in use. You have to convert them to milliseconds before comparing with your clock or with timestamps from other RTP streams.
Added:
To convert the timstamp to seconds, just divide the timestamp by the sample rate. For most audio codecs, the sample rate is 8 kHz.
See here for a few examples.
Note that video codecs typically use 90000 for the timestamp rate.
Instead of guessing at the clock rate, look at the a=rtpmap line in the sdp for the payload in use. Example:
a=audio 5678 RTP/AVP 0 8 99
a=rtpmap 0 PCMU/8000
a=rtpmap 8 PCMA/8000
a=rtpmap 99 AAC-LD/16000
If the payload is 0 or 8, timestamps are 8KHz. If it's 99, they're 16KHz. Note that the rtpmap line has an optional 'channels' parameter, as in "a=rtpmap payload name/rate[/channels]"
Been researching this question for about an hour for the case of audio. Seems like the answer is: the RTP timestamp is incremented by the number of audio time units (samples) in a packet. Take this example where you have a stream of encoded, 2 channel audio, sampled at 44100 before the audio was encoded. Say that you send 512 audio samples (256 time units because we have 2 channel audio) for every packet. Assuming the first packet has a timestamp of 0 (it should be random though according to the RTP spec (RFC 3550)), the second timestamp would be 256, and the third 512. The receiver can convert the value back to an actual time by dividing the timestamp by the audio sample rate, so the first packet would be T0, the second equals 256/44100=0.0058 seconds, the third equals 512/44100=0.0116 seconds, etc.
Someone please correct me if I'm wrong, I'm not sure why there aren't any articles online that state it this way. I guess it would be more complicated if the resolution of the RTP timestamp is different than the sample rate of the audio stream. Nevertheless, converting the timestamp to a different resolution is not complicated. Use the example as before, but change the resolution of the RTP timestamp to 90 kHz, as in MPEG4 Audio (RFC 3016). From the source side the first timestamp is 0, the second is 90000*(256/44100)=522, and the third is 1044. And on the receiver, the time is 0 for first packet, 522/90000=0.0058 for the second, and 1044/90000=0.0116 for the third. Again, someone please correct me if I'm wrong.

Resources