WebRTC peer to nodejs - node.js

I would like to use webRTC but instead of p2p would like to broadcast my audio/video feed to nodejs in realtime. I can encode the video to 125 kbps and 10-12 frames per second for smooth transmission. The idea is nodejs will receive this feed, save it and broadcast it on the same time as a realtime session/webinar. I can connect p2p but I am not sure how to
send feed to nodejs instead of peer
on nodejs how to receive feed

The WeRTC protocol suite is complex enough that an implementation from scratch for a selective forwarding unit SFU is likely to take at least a year by a team of experts. It requires handling a variety of networking protocols including datagrams (UDP) and TCP. And it may require transcoding between video and audio codecs.
The good news is that browser endpoints are now excellent. And open-source server implementations are good enough to get to a minimum viable product.

Related

Broadcasting 2 WEBRTC signals to multiple WEBRTC clients

I want to create an online classes type of site. I want the tutor to broadcast to all students and if a student has a question they can broadcast to ask the whole class. Meaning only a max of 2 people will be broadcasting. I would want to use webRTC but connecting like 30 people would give to much overhead. Is there a way of broadcasting 2 signals to 30 users using webRTC where the 30 remain dumb clients while using SOCKET IO for signalling?
Came across RTMP when doing my research and would like to ask if the tutor and the student (with the question) could "stream" their sessions to the other students. Where both can communicate with Webrtc after which streams are broadcasted to the others.
Can it be done ? Can it be done using REACT, SOCKET IO, WEBRTC and or RTMP ?
One option would be to send the stream to some users, then let those users retransmit to others. This can be done with webrtc scalable broadcasting. There will be more latency the more users come in between though.
A more used solution in an SFU. With this solution the sender will only need to send once stream to the server and the server handles all the retransmission to the other users. So by having a more powerful server you can easily scale your application for more users. There are several ways to implement this:
Janus-gateway
Kurento
Mediasoup
Here is a simple example project of how videoconferencing is implemented with mediasoup.

What is the role of SFU., Janus, mediasoup or medooze. on a webRTC application

I'm using a webRTC application with a simple-peer npm package.
I want to know what is the purpose of all these topics (SFU., Janus, mediasoup or medooze.) and how can I integrate them to make my application performance greater?
PS: I'm using a node.js server the bundle the requesting and signaling between peers on my architecture. are those servers and services required to make my application performance well?
Hope I could find an answer here ...
With regular webrtc every peer needs to send and receive its data seperately to every other peer.
So let's say there are 10 peers that do a video chat. Then every peer has to send their video 9 times simultaneously and also receive 9.
Every peer would use a big amount of upload bandwidth which they usually don't have.
SFUs solve this problem by every peer sending only one stream to a mediaserver and letting that server do all the routing to the other peers. This way every peer only sends 1 stream and receives 9. The download max download bandwidth is usually higher than the upload bandwidth.
There is also something called simulcast which automatically switches the quality of depending on the available bandwidth of the peer. I have been able to achieve this with mediasoup.
According to my question on top and many research after, I found that :
SFU is the technology (server-side) that Leads the WebRTC communication:
How to Produce (share) the stream between peers.
how to Consume this stream of media in the other peers.
How the topology-if I could say-that works between PRODUCERS (the ones who share the streaming) and the CONSUMERS.
This is the global idea about it since you have to go deeper for the implementation.
The services I asked about like Mediasoup, Medooze...etc, they are services that implement that technology of SFU.
You could go to one of them and learn how to implement the SFU throw it.
The WebRTC SFU server can:
Forward: Only need to send 1 stream to SFU, which forwards to other peers in the room.
Simulcast: If the stream is simulcast stream, SFU can forward streams with different bitrate like MCU, but with less CPU cost without transcoding.
Protocol Converter: SFU can also convert WebRTC to other protocols, like publish to YouTube by RTMP.
DVR: Record the WebRTC stream as VoD file such as MP4 file.
Network Quality: SFU provides better network quality, especially when P2P is not able to enabled.
Firewall Traverse: For peers behind enterprise firewall, the UDP might not avaialbe, SFU can use HTTP(TCP/80) or HTTPS(TCP/443) port.
Forward
The default model of WebRTC is P2P like this:
PeerA(WebRTC/Chrome) --------> PeerB(WebRTC/Chrome)
PeerB(WebRTC/Chrome) --------> PeerA(WebRTC/Chrome)
If you got three participants in a room:
PeerA(WebRTC/Chrome) --------> PeerB(WebRTC/Chrome)
PeerA(WebRTC/Chrome) --------> PeerC(WebRTC/Chrome)
PeerB(WebRTC/Chrome) --------> PeerA(WebRTC/Chrome)
PeerB(WebRTC/Chrome) --------> PeerC(WebRTC/Chrome)
PeerC(WebRTC/Chrome) --------> PeerA(WebRTC/Chrome)
PeerC(WebRTC/Chrome) --------> PeerB(WebRTC/Chrome)
For P2P model, each peer need to send N-1 streams and receive N-1 streams from other peers, which requires lots of upload bandwidth.
SFU can forward the stream to other peers, like this:
PeerA(WebRTC/Chrome) ---> SFU --+--> PeerB(WebRTC/Chrome)
+--> PeerC(WebRTC/Chrome)
PeerB(WebRTC/Chrome) ---> SFU --+--> PeerA(WebRTC/Chrome)
+--> PeerC(WebRTC/Chrome)
PeerC(WebRTC/Chrome) ---> SFU --+--> PeerA(WebRTC/Chrome)
+--> PeerB(WebRTC/Chrome)
For SFU model, each peer only need to send 1 stream and receive N-1 streams, so this model is better than P2P, especially when there are more peers in a room.
Simulcast
Because the network of peers is different, so SFU can use simulcast to send diffent bitrate to peers, it works like this:
PeerA(WebRTC/Chrome) --1Mbps-> SFU --+--1Mbps----> PeerB(WebRTC/Chrome)
+--500Kbps--> PeerC(WebRTC/Chrome)
Because the network of PeerC is worse, so SFU send stream in bitrate 500Kbps.
Please note that this requires PeerA to use AV1 codec, the H.264 is not supported by default, so it's not a perfect solution.
And it's also complex and PeerC might doesn't want low bitrate stream, but it accepts larger latency, so this solution does not always work.
Note: Simulcast is not the same to MCU, which requires a lots of CPU cost for transcoding. MCU convert the streams in a room to 1 stream for a peer to recieve, so it's been used in some scenario such as for SIP embeded device, which only recieve 1 stream with video and audio.
There are lots of SFU servers can do this, for example, SRS, Mediasoup, Janus and Licode.
Note: Right not at 2023.02, SRS simulcast feature has not been merged to develop, it's in a feature branch.
Protocol Converter
Sometimes you want to covert WebRTC to live streaming, for example, to open a web page and publish camera stream to YouTube.
How to do that by SFU? It works like this:
Chrome --WebRTC------> SFU ---RTMP------> YouTube/Twitch/TikTok
(H.264+OPUS) (H.264+AAC)
In this model, SFU only need to covert audio stream from OPUS to AAC, and video stream is by-pass for both WebRTC and RTMP is H.264.
Because of the audio transcoding, there are few of SFU servers can do this, for example, SRS and Janus.
Note: Janus need ffmpeg to covert RTP packets, while SRS do this natively so it's easy to use.
DVR
SFU can also DVR WebRTC streams to MP4 file, for example:
Chrome ---WebRTC---> SFU ---DVR--> MP4
This enable you to use a web page to upload MP4 file. For example, to allow user to record a clip of camera to feedback for your product.
Similar to live streaming, MP4 file support AAC better, so SFU need to convert OPUS to AAC.
Because of the audio transcoding, there are few of SFU servers can do this, for example, SRS.
Note: I'm not sure which SFU servers supports this, please let me know if I miss something.
Network Quality
On internet, the SFU model is better than P2P model. Consider about the bellow flow:
PeerA <----Internet--> PeerB
P2P seems simple and high efficiency, but there are actually lots of routers and network devices, generally they are servers, so the flow of P2P model should be:
PeerA <--------Internet----------> PeerB
Routers, Servers, etc.
From the perspective of network transport, the SFU model is similar:
PeerA <--------SFU-Server----------> PeerB
Routers, Servers, etc.
SFU network quality is better than P2P, not about the server but you are able to control the transport network by use dedicated server and even dedicated network.
But for P2P, you can't control the routers and servers, all peers are clients.
Note: TURN server model also improve network quality, but SFU is still better because you could use some QoS algorithms such as GCC on SFU, because SFU server is actually a client, but TURN is just a proxy.
Note: SFU cluster, which is built of a set of SFU servers, can also iprove the quality when peers crossing countries.
Firewall Traverse
For some users behind enterprise firewall, UDP is not available:
Firewall
|
Chrome -----X---WebRTC--- Chrome(PeerB)
PeerA | (UDP)
Even worse, only HTTP(TCP/80) or HTTPS(TCP/443) is allowed by some firewall. So we can use SFU which listen at HTTP(TCP/80) or HTTPS(TCP/443), it works like this:
Firewall
|
Chrome -----+---WebRTC-------> SFU ---> Chrome(PeerB)
PeerA | (TCP 80/443)
Note: Yep, TURN server can also solve this problem, coturn as such, but note that TURN server usually allocate a set of port, not fixed ports, so TURN server is not easy to use as SFU server.
There are few of SFU servers can do this, for example, SRS and Mediasoup.
Note: I'm not sure which SFU servers supports this, please let me know if I miss something.

Mix multiple RTP streams into a single one

I am trying to build a basic conference call system based on plain RTP.
_____
RTP IN #1 ______ | | _______ MIX RTP receiver #1
|______| MIX |_____|
______| | RTP | |_______ MIX RTP receiver #2
RTP IN #2 |_____|
I am creating RTP streams on Android via the AudioStream class and using a server written in Node.js to receive them.
The naive approach I've been using is that the server receives the UDP packets and forwards them to the participants of the conversation. This works perfectly as long as there are two participants, and it's basically the same as if the two were sending their RTP stream to each other.
I would like this to work with multiple participants, but forwarding the RDP packets as they arrive to the server doesn't seem to work, probably for obvious reasons. With more than two participants, the result of delivering the packets coming in from different sources to each of the participants (excluding the sender of such packet) results in a completely broken audio.
Without changing the topology of the network (star rather than mesh) I presume that the server will need to take care of carrying out some operations on the packets in order to extract a unique output RTP stream containing the mixed input RTP streams.
I'm just not sure how to go about doing this.
In your case I know two options:
MCU or Multipoint Control Unit
Or RTP simulcast
MCU Control Unit
This is middle box (network element) that gets several RTP streams and generate one or more RTP streams.
You can implement it by yourself but it is not trivial because you need to deal with:
Stream decoding (and therefore you need jitter buffer and codecs implementation)
Stream mixing - so you need some synchronisation between streams (collect some data from source 1 and source 2, mix them and send to destination 3)
Also there are several project that can do it for you (like Asterisk, FreeSWITCH etc), you can try to write some integration level with them. I haven't heard anything about something on Node.js
Simulcast
This is pretty new technology and their specifications available only in IETF drafts. Core idea here is to send several RTP streams inside one RTP stream simultaneously.
When destination receives several RTP streams it needs to do exactly the same as MCU does - decode all streams and mix them together but in this case destination may use hardware audio mixer to do that.
Main cons for this approach is bandwidth to the client device. If you have N participants you need:
either send all N streams to all other
or select streams based on some metadata like voice activity or audio level
First one is not efficient, second is very tricky.
The options suggested by Dimtry's answer were not feasible in my case because:
The middle box solution is difficult to implement, requires too many resources or requires to rely on an external piece of software, which I didn't want to have to rely on, especially because Android RTP stack should work out of the box with basic support from a server component, especially for hole punching
The simulcast solution cannot be used because the Android RTP package cannot handle that and ad far as my understanding goes it's only capable of handling simple RTP streams
Other options I've been evaluating:
SIP
Android supports it but it's more of a high level feature and I wanted to build the solution into my own custom application, without relying on additional abstractions introduced by a high level protocol such as SIP. Also, this felt just too complex to set up, and conferencing doesn't even seem to be a core feature but rather an extension
WebRTC
This is supposed to be the de-facto standard for peer 2 peer voice and video conferencing but looking through code examples it just looks too difficult to set up. Also requires support from servers for hole punching.
Our solution
Even though I had, and still have, little experience on this I thought there must be a way to make it work using plain RTP and some support from a simple server component.
The server component is necessary for hole punching, otherwise getting the clients to talk to each other is really tricky.
So what we ended up doing for conference calling is have the caller act as the mixer and the server component as the middle-man to deliver RTP packets to the participants.
In practice:
whenever a N-user call is started, we instantiate N-1 simple UDP broadcast servers, listening on N-1 different ports
We send those N-1 ports to the initiator of the call via a signaling mechanism built on socket.io and 1 port to each of the remaining participants
The server component listening on those ports will simply act as a relay: whenever it receives a UDP packet containing the RTP data it will forward it to all the connected clients (the sockets it has seen thus far) except the sender
The initiator of the call will receive and send data to the other participants, mixing it via the Android AudioGroup class
The participants will send data only to the initiator of the call, and they will receive the mixed audio (together with the caller's own voice and the other participants' voices) on the server port that has been assigned to them
This allows for a very simple implementation, both on the client and on the server side, with minimal signaling work required. It's certainly not a bullet proof conferencing solution, but given the simplicity and feature completeness (especially regarding common network issues like NAT traversal, which using a server aid is basically a non-issue) is in my opinion better than writing lots of code which requires many resources for mixing server-side, relying on external software like SIP servers, or using protocols like WebRTC which basically achieve the same with lots more effort implementation wise.

wowza webrtc our own voice and feedback or echo

We have deployed webrtc on wowza. However, we are getting our own voice back. Could be feedback or echo?
As far as I see, there is currently no method in wowza to battle echo. However, you can install extra layers of audio filtering - for example, this article shows how to use PBXMate for echo cancellation. In case this link becomes invalid, full requirements are following:
The Flashphoner Client is a flash based client. It could be replaced by other flash based clients.
The Wowza server is a standard streaming server.
The Flashphoner is responsible for translating the protocol of the streaming data to the standard SIP protocol.
The Elastix server is a well known unified communication server.
The PBXMate is an Elastix AddOn for audio filtering.

how to create a RTSP streaming server

So I am trying to create a RTSP server that streams music.
I do not understand how the server plays a music and different requests get what ever is playing at that time.
so, to organize my questions:
1) how does the server play a music file?
2) how does the request to the server look like to get whats currently playing?
3) what does the response look like to get the music playing in the client that requested the music?
First: READ THIS (RTSP), and THEN READ THIS (SDP), and then READ THIS (RTP). Then you can ask more sensible questions.
It doesn't, server streams little parts of the audio data to the client, telling it when each part is to be played.
There is no such request. If you want, you can have URL for live streaming, and in RTSP DESCRIBE request, tell the client what is currently on.
Read the first (RTSP) document, all is there! Answer to your question is this:
RTSP/1.0 200 OK
CSeq: 3
Session: 123456
Range: npt=now-
RTP-Info: url=trackID=1;seq=987654
But to get the music playing you will have to do a lot more to initiate a streaming session.
You should first be clear about what is RTSP and RTP. The Real Time Streaming Protocol (RTSP) is a network control protocol designed for use in communications systems to control streaming media servers. where as Most RTSP servers use the Real-time Transport Protocol (RTP) for media stream delivery. RTP uses UDP to deliver the Packet Stream. try to Understanding these concepts.
then Have a look at this project.
http://sourceforge.net/projects/unvedu/
This a open source project developed by our university, which is used to stream video(MKV) and audio file over UDP.
You can also find a .Net Implementation of RTP and RTSP here # https://net7mma.codeplex.com/ which includes a RTSP Client and Server implementation and many other useful utilities e.g. implementations of many popular Digital Media Container Formats.
The solution has a modular design and better performance than ffmpeg or libav at the current time.

Resources