Session start request using UDP sockets - linux

I have been using UDP sockets to send and receive voice through RTP packetization. It is pretty straightforward. I just send my mic voice signals ( that are encoded ) over IP using User Datagram socket , and on the other end i receive the UDP-RTP packets and decode them to be able to play them on my speakers.
I have been searching on internet for a while to find a way to start up a session using UDP sockets. What i want to to is to a Handshake-like process between two ends of my conversation and after the requests were acknowledged the media layer ( which i described in first paragraph ) would fire and start sending voice.
I have not been able to find any tutorials on session request using UDP sockets but i suppose it shouldnt be impossible.( one user sends a request to build a session and if the other user confirms media layer starts)
Has anyone done something like this before? any info is welcome.

Firstly, UDP is a connectionless, unreliable protocol, you won't find anything like handshaking for negotiating connection i.e no session management. But, to transport RTP packets it's not a good idea to use tcp, it lacks realtime feature, so you have to stick with UDP. Now, to overcome the signaling problem you can use protocols like. SIP. It's standard signaling protocol used in VOIP. SIP initiates a connection before sending RTP packets. To properly use SIP and RTP you might have to take help of another protocol called SDP, which tells which port to use for transmitting RTP and other various info. You can get more info about these techniques here. Hope this will helps!

Related

RTSP and RTP for streaming from an IP Camera

I have a simple application in c# that opens an RTSP session and sends the appropriate commands like DESCRIBE, SETUP, etc to control an RTP data stream.
My questions is this: does the TCP session (for the RTSP communication) have to stay open while streaming the data over RTP? Without going into details as to why, I'd like to be able to close the RTSP session after the RTP streaming is setup.
RFC 7826 Real-Time Streaming Protocol Version 2.0
A persistent connection is RECOMMENDED to be used for all
transactions between the server and client, including messages for
multiple RTSP sessions. However, a persistent connection MAY be
closed after a few message exchanges. For example, a client may use
a persistent connection for the initial SETUP and PLAY message
exchanges in a session and then close the connection. Later, when
the client wishes to send a new request, such as a PAUSE for the
session, a new connection would be opened. This connection may be
either transient or persistent.
So no, it is not required to be open while streaming data.
The short answer is yes you must keep the rtsp socket open.
If the stream is over TCP the RTP packet travels over the socket of the RTSP session. If the stream is over UDP the socket is used by the server to know if it should keep sending packet to the client because the UDP packet transmission does not have any feedback if the client is still alive.
Edited:
I think the answer marked correct is actually incorrect. In general, the cameras implement the RTSP protocol version 1 not version 2. In the 10 years that I have of experience in video surveillance systems I have found that several models of cameras stop sending RTP packets after closing the RTSP session socket .

Using PyAV to generate audio broadcast server (UDP socket)

I have to create a service which captures the audio from the PC microphone and to broadcast it as UDP packets. I am on a Debian platform and I have to use Python (3.7).
I would like to use PyAV because I have to link this broadcasting system to a local custom WebRTC service using aiortc, which relies on PyAV.
I have to do this because I cannot access the same audio source (ALSA) from several processes (RTC peers), so I was thinking to create a UDP broacasting system in a localhost environment. Is this the best practice? Have you any other idea?
I have noticed here that with the call: av.open("udp://xxx:nnn", format="alsa") I should be able to receive audio UDP packets, but I am not sure how to generate a UDP server which captures from the mic and sends the UDP packets, so, how to create the server side of this implementation? In particular, I managed to capture the audio with: av.open("hw:0", format="alsa"), how can I send the captured buffer over UDP sockets?

Asterisk RTP data to Node.JS app

I've successfully set up Asterisk on my server using the res_pjsip Hello World configuration from their wiki, and I want to be able to forward the RTP data to a Node.JS app, which can interpret RTP. I've heard about directmedia and directrtpsetup (see this stackoverflow) but I'm not sure if that's what I want. So my question is this:
Should I use directmedia / directrtpsetup to send voice data to my Node.JS app, or should I use some sort of Asterisk functionality to forward RTP packets? If the latter, how can Asterisk forward just the voice data?
I can clarify if needed, but hopefully this is more specific than my last questions. Thanks!
UPDATE: Having poked around Asterisk docs and messing with Wireshark, I think I have two options.
Figure out if there's a channel driver for Asterisk that just sends RTP, without any signaling, or
Capture the RTP stream with Wireshark or something and send the packets to the Node.JS app, and inject the return packets into the RTP stream.
Asterisk is PBX. It not suitable to "redirect rtp data"
No, there are no reason in having channel driver "without signalling". For what anyone can use it? How to determine call started if "no signaling"? It will be useless.
You can write such app in c/c++ or use other soft designed to be traffic capture: libpcap, tcpdump etc.
You also can use audio staff: libalsa, jack.
Best option however will be create or find full featured sip client and use it.

remote dial using SIP client

I want to remote dial from my pc using a simple non SIP client program which I wrote and wchich sends commands to a proprietary SIP client that accepts remote commands via a TCP connection. The proprietary SIP client will then dial the remote party using my PC's IP and port number in SDP for RTP. Is this possible in principle? Are there any opensource clients available that use this concept? Is there any documentation (IETF RFCs, blogs etc) that is available.
Appreciate any help in this matter.
Check out pjsip, it's an open-source cross-platform SIP client for all major platforms and with API in C and an API wrapper for python, whichever you prefer. There are also examples on their site. Link your TCP parsing code to pjsip and call its functions to initiate a call, you can find how to do it on their site
If I understand correctly, here is what you want to do:
TCP SIP/SDP/RTP
PC <===> SIP client <===========> softswitch
Actually, TCP between PC and SIP client will probably be accurate for signalling but not for media as RTP media stream is often sent over UDP.
In my opinion, the first step is to make sure that your softswitch will accept sending RTP packets to an IP address which is not the same as SIP client (I think most of them refuse for security reasons). If it accepts and if you have no NAT between your SIP client and your PC, you should be able to send RTP stream directly to your PC. In this case, you have to retrieve RTP packets, eventually rearrange them, decompress their payload and feed them to your speakers.
If your softswitch does not want to send RTP packets to an IP address different from SIP IP address, then you have to forward your RTP packets from your SIP client to your PC. But if you can't modify your SIP client to do this (and it's probably the case as it's a proprietary software), you're probably stuck.
To test whether your softswitch accepts sending RTP packets to an unintended IP address, you can use sipp and specify a remote media ip address different from SIP signalling IP address.

Structure of a Voice Chat application (Client/Server)?

I need an EXPERT opinion please, and sorry if my question itself is a confused question.
I was reading around about structure of VOIP applications (Client/Server). And mostly UDP is recommended for voice streams. I also checked some voicechat applications like paltalk and inspeak and their sites mention they use udp voice stream which dont seem correct for below reasons.
I examined the traffic/ports used by paltalk and inspeak. They have UDP and TCP ports open and using a packet sniffer i can see there is not much UDP communication but mostly it is the TCP communication going on.
Also as far as i know, In UDP Protocol server can not send data to a client behind NAT (DSL Router). And "UDP Braodcast" is not an option for "internet" based voice chat applications. THATS WHY YAHOO HAVE MENTIONED in their documentation that yahoo messenger switch to tcp if udp communication is not possible.
So my question is ....
Am i understanding something wrong in my above statements ?
If UDP is not possible then those chat applications use TCP Stream for voice ?
Since i have experienced that TCP voice streams create delay, No voice breaking but Delay in voice, so what should be the best structure for a voice chat server/client communication ?
So far i think that , if Client send data as udp packets to server and server distribute the packets to clients over TCP streams, is this a proper solution ? I mean is this what commercial voicechat applications do ?
Thanks your answer will help me and a lot of other programmers .
JF
UDP has less overhead (in terms of total packet size), so you can squeeze more audio into the channel's bandwidth.
UDP is also unreliable - packets sent may never be received or could be received out of order - which is actually OK for voice applications, since you can tolerate some loss of signal quality and keep going. a small amount of lost packets can be tolerated (as opposed to downloading a file, where every byte counts).
can you use TCP? sure, why not... it's slightly more overhead, but that may not matter.
SIP is a voice/media standard that supports UDP and TCP. most deployments use UDP because of the lower overhead.
The Skype protocol prefers UDP where possible, and falls back to TCP.
in SIP situations, the NAT problem is solved by using a nat keep-alive packet (any request/response data) to keep the channel up and open, and by exploiting the fact that most NATs will accept replies on the same source port the connection was opened from... this isn't foolproof, and often requires a proxy server mediating the connection between 2 nat'd peers, but it's used in many deployments.
STUN, TURN, and ICE are additional methods that help with NAT scenarios, and especially in p2p (serverless) situations.
info regarding NAT issues and media:
http://www.voip-info.org/wiki/view/NAT+and+VOIP
http://en.wikipedia.org/wiki/UDP_hole_punching
http://www.h-online.com/security/features/How-Skype-Co-get-round-firewalls-747197.html
if you're implementing a voice service of some kind, a system like FreeSwitch provides most of the tools you need to deliver media to distributed clients:
http://www.freeswitch.org/
I see the question is 3 years overdue, but I see no answer accepted, so I'll take a shot at it
1- your statements are correct
2- correct, TCP or UDP can be used for audio stream.
3- Combining tcp and udp for the audio stream is not useful. If UDP is working for transmission to the server, it will work for reception, that's how all NAT firewalls work, i.e they send datagram received from internal host to remote host after they change the ip header to make the packet seem coming from them, and when they receive response, they forward it back to internal host. The difference between NAT firewalls is for how long the NAT tunnel will remain alive, but this does not matter for the audio part of the call, as there is constant flow of audio in both way during a call. This would matter more for the signalling part of the call, which uses the SIP protocol. So I would recommend using TCP for SIP as the TCP session has a default timeout of 900s, making the keep alive messages less frequently needed.
Now some applications you mentioned do not use SIP for session initiation, and hence have proprietary ways of signalling.
Other applications take advantage of something called 'hole punching' to allow client-to-client direct communication (or peer-to-peer) such as Skype. The advantage of these is that the server does not stay in the middle of the voice stream, and this can effectively reduce latency, making TCP a potential choice for the audio stream.
The guys behind development of Asterisk, the famous opensource PBX, have realized the problems in SIP which require having lots of ports open, and they have developed their own protocol, called IAX, to transmit signalling and media over one port. I would encourage you to consider implementing IAX for your client/server, because it ensures that if a client is able to connect (through signalling), then it's able to make calls.

Resources