I've two network cameras that support RTSP, ONVIF v2.0, and a few other protocols (full list is in the link above). I want to read frames from these two streams at the same instance (or at least within a few ms) so that I can get a better view of my place by combining the information from these two images as if I were using a stereo camera pair and adding some intelligence on top of it.
So far, I've looked into RTSP and found that the RTP packet Header has this information (Source) and I found that I can use the NTP timestamp from RTCP sender reports but I'm not really sure how to use these to get absolute timestamps per frame. I'm using nodejs (rtsp-ffmpeg library) to retrieve frames from rtsp stream, I can use ONVIF but I didn't find any clear way to get the timestamp per frame or synchronize the videos to make sure I read the frames for same client timestamp with ONVIF either (ONVIF v2.6 Specs).
Related
We have a Redis server that all clients attach to for a variety of data transfer and coordination tasks. We have a new requirement that we support video streaming. I would like to avoid running a dedicated service (with all the accompanying network and security requirements that entails) and just stream over Redis.
Redis seems like a good fit for real time streaming, in particular using Redis streams. I realize that "Redis streams" have no relation to "video streaming", however, our use case follows Redis stream structure well. We want to buffer X seconds of video continuously allowing clients to attach to that real-time stream at any time. We have no need to store history or serve static video content.
Redis seems like a good solution, my problem is I don't know how to
stream an appropriate video codec (Motion JPEG maybe?) over Redis.
I wouldn't know how to join a stream mid-broadcast (join at a keyframe
perhaps?).
I wouldn't know how to serialize the stream to bytes at
the server (Python based) and de-serialize the stream to a video codec and player on
the client (a browser). Perhaps it's as simple as seralization/deseralization in opencv or equivalent and I'm just over thinking it?
These are all features I would typically look to an API to perform, but is there an API is capable of this? I'm inexperienced in the field of video streaming.
At a high level, I prefer viewing streaming as a pub-sub problem. Where producers produce chunks of information and consumers read that information on need basis.
Some solution may not be readily available, we may need to perform the following steps:
Publish:
1. chunk-id : content
2. chunk-id-fwd : (nextChunkId)
3. videoId : latestChunkId (Assuming your realtime usecase is for live streams, this can help users access 'go-live' button)
Consume:
Start:
1. Get latest chunk
2. Get content from latest chunkId
3. Get nextChunkId from chunk-id-fwd
I am trying to build a basic conference call system based on plain RTP.
_____
RTP IN #1 ______ | | _______ MIX RTP receiver #1
|______| MIX |_____|
______| | RTP | |_______ MIX RTP receiver #2
RTP IN #2 |_____|
I am creating RTP streams on Android via the AudioStream class and using a server written in Node.js to receive them.
The naive approach I've been using is that the server receives the UDP packets and forwards them to the participants of the conversation. This works perfectly as long as there are two participants, and it's basically the same as if the two were sending their RTP stream to each other.
I would like this to work with multiple participants, but forwarding the RDP packets as they arrive to the server doesn't seem to work, probably for obvious reasons. With more than two participants, the result of delivering the packets coming in from different sources to each of the participants (excluding the sender of such packet) results in a completely broken audio.
Without changing the topology of the network (star rather than mesh) I presume that the server will need to take care of carrying out some operations on the packets in order to extract a unique output RTP stream containing the mixed input RTP streams.
I'm just not sure how to go about doing this.
In your case I know two options:
MCU or Multipoint Control Unit
Or RTP simulcast
MCU Control Unit
This is middle box (network element) that gets several RTP streams and generate one or more RTP streams.
You can implement it by yourself but it is not trivial because you need to deal with:
Stream decoding (and therefore you need jitter buffer and codecs implementation)
Stream mixing - so you need some synchronisation between streams (collect some data from source 1 and source 2, mix them and send to destination 3)
Also there are several project that can do it for you (like Asterisk, FreeSWITCH etc), you can try to write some integration level with them. I haven't heard anything about something on Node.js
Simulcast
This is pretty new technology and their specifications available only in IETF drafts. Core idea here is to send several RTP streams inside one RTP stream simultaneously.
When destination receives several RTP streams it needs to do exactly the same as MCU does - decode all streams and mix them together but in this case destination may use hardware audio mixer to do that.
Main cons for this approach is bandwidth to the client device. If you have N participants you need:
either send all N streams to all other
or select streams based on some metadata like voice activity or audio level
First one is not efficient, second is very tricky.
The options suggested by Dimtry's answer were not feasible in my case because:
The middle box solution is difficult to implement, requires too many resources or requires to rely on an external piece of software, which I didn't want to have to rely on, especially because Android RTP stack should work out of the box with basic support from a server component, especially for hole punching
The simulcast solution cannot be used because the Android RTP package cannot handle that and ad far as my understanding goes it's only capable of handling simple RTP streams
Other options I've been evaluating:
SIP
Android supports it but it's more of a high level feature and I wanted to build the solution into my own custom application, without relying on additional abstractions introduced by a high level protocol such as SIP. Also, this felt just too complex to set up, and conferencing doesn't even seem to be a core feature but rather an extension
WebRTC
This is supposed to be the de-facto standard for peer 2 peer voice and video conferencing but looking through code examples it just looks too difficult to set up. Also requires support from servers for hole punching.
Our solution
Even though I had, and still have, little experience on this I thought there must be a way to make it work using plain RTP and some support from a simple server component.
The server component is necessary for hole punching, otherwise getting the clients to talk to each other is really tricky.
So what we ended up doing for conference calling is have the caller act as the mixer and the server component as the middle-man to deliver RTP packets to the participants.
In practice:
whenever a N-user call is started, we instantiate N-1 simple UDP broadcast servers, listening on N-1 different ports
We send those N-1 ports to the initiator of the call via a signaling mechanism built on socket.io and 1 port to each of the remaining participants
The server component listening on those ports will simply act as a relay: whenever it receives a UDP packet containing the RTP data it will forward it to all the connected clients (the sockets it has seen thus far) except the sender
The initiator of the call will receive and send data to the other participants, mixing it via the Android AudioGroup class
The participants will send data only to the initiator of the call, and they will receive the mixed audio (together with the caller's own voice and the other participants' voices) on the server port that has been assigned to them
This allows for a very simple implementation, both on the client and on the server side, with minimal signaling work required. It's certainly not a bullet proof conferencing solution, but given the simplicity and feature completeness (especially regarding common network issues like NAT traversal, which using a server aid is basically a non-issue) is in my opinion better than writing lots of code which requires many resources for mixing server-side, relying on external software like SIP servers, or using protocols like WebRTC which basically achieve the same with lots more effort implementation wise.
I am involved in building a Real Time Messaging Protocol Parser.I am collecting the video/audio data from the RTMP packets.Now to play a video in any player I need to know the container format as well as the codec used.In the video data I am getting from the RTMP packets I know the codec used (for eg. On2 VP6).But I don't know how to know the container of the audio/video stream that I am receiving . So should I assume that RTMP support only FLV container ??? Or is it possible for me to get audio/video packets from any other container formats ?? If Yes then how to know the type of container used from the RTMP data from the information present in RTMP packet ?Adobe specification for RTMP does not provide any information regarding the container of the audio/video data. Any help on this ??? I am stuck here for quite some time.
It is a bit wrong question.
RTMP is a transport protocol that includes containers inside.
Technically it is not correct to say that RTMP carries FLV, because FLV has two layers of incapsulation and RTMP carries only bottom level.
So, it is right to say that RTMP can transfer only those codecs that FLV can and it is not 100% right to say that RTMP transfers FLV.
Adobe's specification of RTMP was created not for developers but for a legal issue against Wowza, so it is not written for you to understand what is happening. Read sources of red5, crtmp or some other rtmp server, they are rather easy to understand.
I have couple of question:
1) what is the max number of users that can receive video?
2) Is it possible only to watch remote streams without access to my camera/microphone? Imagine that I only want to watch debate between Dawkins and Pope Francis. :)
Regards
Answer to #1:
The maximum number of users that can be simultaneously sending video to each other is limited by the capabilities of the hardware to encode and decode video streams. There is no hard limit.
If you are looking to do a single sender and multiple receivers, then again you are limited by the local machine. The sender will need to encode a separate stream for each receiver since the available bandwidth to each receiver will be different and impact the quality of video that can be sent.
Answer to #2:
You do not have to send audio and video. Even if you give permission to access your camera and microphone, you can later mute them (https://vline.com/developer/docs/vline.js/vline.MediaStream)
Also, take a look at this page for some more thoughts on this:https://www.webrtc-experiment.com/webrtc-broadcasting/
So I am trying to create a RTSP server that streams music.
I do not understand how the server plays a music and different requests get what ever is playing at that time.
so, to organize my questions:
1) how does the server play a music file?
2) how does the request to the server look like to get whats currently playing?
3) what does the response look like to get the music playing in the client that requested the music?
First: READ THIS (RTSP), and THEN READ THIS (SDP), and then READ THIS (RTP). Then you can ask more sensible questions.
It doesn't, server streams little parts of the audio data to the client, telling it when each part is to be played.
There is no such request. If you want, you can have URL for live streaming, and in RTSP DESCRIBE request, tell the client what is currently on.
Read the first (RTSP) document, all is there! Answer to your question is this:
RTSP/1.0 200 OK
CSeq: 3
Session: 123456
Range: npt=now-
RTP-Info: url=trackID=1;seq=987654
But to get the music playing you will have to do a lot more to initiate a streaming session.
You should first be clear about what is RTSP and RTP. The Real Time Streaming Protocol (RTSP) is a network control protocol designed for use in communications systems to control streaming media servers. where as Most RTSP servers use the Real-time Transport Protocol (RTP) for media stream delivery. RTP uses UDP to deliver the Packet Stream. try to Understanding these concepts.
then Have a look at this project.
http://sourceforge.net/projects/unvedu/
This a open source project developed by our university, which is used to stream video(MKV) and audio file over UDP.
You can also find a .Net Implementation of RTP and RTSP here # https://net7mma.codeplex.com/ which includes a RTSP Client and Server implementation and many other useful utilities e.g. implementations of many popular Digital Media Container Formats.
The solution has a modular design and better performance than ffmpeg or libav at the current time.