I have couple of question:
1) what is the max number of users that can receive video?
2) Is it possible only to watch remote streams without access to my camera/microphone? Imagine that I only want to watch debate between Dawkins and Pope Francis. :)
Regards
Answer to #1:
The maximum number of users that can be simultaneously sending video to each other is limited by the capabilities of the hardware to encode and decode video streams. There is no hard limit.
If you are looking to do a single sender and multiple receivers, then again you are limited by the local machine. The sender will need to encode a separate stream for each receiver since the available bandwidth to each receiver will be different and impact the quality of video that can be sent.
Answer to #2:
You do not have to send audio and video. Even if you give permission to access your camera and microphone, you can later mute them (https://vline.com/developer/docs/vline.js/vline.MediaStream)
Also, take a look at this page for some more thoughts on this:https://www.webrtc-experiment.com/webrtc-broadcasting/
Related
In a SFU audio conference platform, media server simply route audio packets. Lets say in client side I keep audio packet queue for each present participant (updated by signaling server) and at a certain rate I simply dequeue from every queue, handle, pick top 4-6 voice packets and mix for play. If sequence number is missing for some participants I even send nack and wait for some threshold time for that participants queue to be dequeued (to maintain the voice flow).
But to make this solution scalable, I have to do this dequeue then pick top 4-6 voice from media server side and send it to every one. Now, from client side, even if some participant's packet sequence gets missing I am not sure whether it was actually missing or it was not able to make it to top 4-6 voice packets in server (as I need to send nNack and wait if packet actually got missing).
How can I handle this usecase efficiently and any suggestion with top mixing numbers or anything is highly appreciable?
I need some help.
What is the best way to set up LIVE STREAMING over the web from my WEBCAM to the server and back to multiple users?
Essentially I'm trying to create a group video chat application that can support many users.
I don't want it to be peer to peer webRTC.
I actually managed to make it work with getUserMedia() -> mediaRecorder -> ondataavailable -> pass blob chunks to node.js via SOCKET.IO -> socket.io sends back blob chunks to other connected users -> append those chunks to a sourceBuffer that's connected to a mediaSource that's set as the source URL on a
And it actually worked! BUT it's so slow and laggy and resource intensive. As these chunks get passed like 20 per second and it's slowing the page a lot. I don't think you're supposed to pass that many blobs to the sourceBuffer so quickly. Just for a test I tried saving mediaRecordings every 3 seconds (so it's not that resource intensive) and passing those webm blobs to the sourceBuffer but for some reason only the first webm loads, and the other ones don't get added or start playing.
It just can't work for a production app this way.
What's the "RIGHT" way to do this?
How to pass a video stream from webcam to a Node.js server properly?
And how to stream this live stream back to the web from the Node.js server so that we can have a group video chat?
I'm a bit lost. Please help.
Do I use HLS? RecordRTC?
Do I stream from Node.js via http or via socket.io?
There are services that already let you do that easily like vonage video api tokbox but those seem to be very expensive?
I want to run the video streaming through my own Node.js server that I control.
What's the best way to do this?
Please help.
Thank you
Essentially I'm trying to create a group video chat application that can support many users.
I don't want it to be peer to peer webRTC.
Video chat requires low latency, and therefore requires usage of WebRTC. Remember that one of the "peers" can actually be a server.
And it actually worked! BUT it's so slow and laggy and resource intensive.
Video encoding/decoding is resource intensive no matter how you do it. If by "slow" and "laggy" you mean high latency, then yes, recording chunks, sending chunks, decoding chunks, will have higher latency by its very nature. Additionally, what you're describing won't drop frames or dynamically adjust the encoding, so if a connection can't keep up, it's just going to buffer until it can. This is a different sort of tradeoff than what you want.
Again, for a video chat, realtime-ness is more important than quality and reliability. If that means discarding frames, resampling audio stupid-fast to catch up, encoding at low bitrates, even temporarily dropping streams entirely for a few seconds, that's what needs to happen. This is what the entire WebRTC stack does.
As these chunks get passed like 20 per second and it's slowing the page a lot. I don't think you're supposed to pass that many blobs to the sourceBuffer so quickly.
No, this is unlikely your problem. The receiving end probably just can't keep up with decoding all these streams.
Do I use HLS?
Not for anyone actively participating in the chat... people who require low latency. For everyone else, yes you can utilize HLS and DASH to give you a more affordable way to distribute your stream over existing CDNs. See this answer: https://stackoverflow.com/a/37475943/362536 Basically, scrutinize your requirements and determine if everyone is actually participating. If they aren't, move them to a cheaper streaming method than WebRTC.
RecordRTC?
No, this is irrelevant to your project and frankly I don't know why people keep using this library for anything. Maybe they have some specific use case for it I don't know about, but browsers have had built-in MediaRecorder for years.
There are services that already let you do that easily like vonage video api tokbox but those seem to be very expensive?
This is an expensive thing to do. I think you'll find that using an existing service that already has the infrastructure ready to go is going to be cheaper than doing it yourself in most cases.
I followed this example and managed to collect the audio buffers from my microphone send them to Dialogflow.
https://cloud.google.com/dialogflow-enterprise/docs/detect-intent-stream
But this processing is sequential. I first have to collect all the audio buffers that I afterwards can send to Dialogflow.
Then I get the correct result and also the intermediate results.
But only after I waited for the person to stop talking first before i could send the collected audio buffers to Dialogflow.
I would like to send (stream) the audiobuffers instantly to dialogflow, while somebody is still talking, and also get the intermediate results right away.
Does anybody know if this is possible and point me in the right direction?
My preferred language is Python.
Thanks a lot!
I got this Answer from the Dialogflow support team:
From the Dialogflow documentation: Recognition ceases when it detects
the audio's voice has stopped or paused. In this case, once a detected
intent is received, the client should close the stream and start a new
request with a new stream as needed. This means that user has to
stop/pause speaking in order for you send it to Dialogflow.
In order for Dialogflow to detect a proper intent, it has to have the
full user utterance.
If you are looking for real-time speech recognition, look into our
Speech-to-text product (https://cloud.google.com/speech-to-text/).
While trying to do something similar recently, I found that someone already had this problem and figured it out. Basically, you can feed an audio stream to DialogFlow via the streamingDetectIntent method and get intermediate results as valid language is recognized in the audio input. The tricky bit is that you need to set a threshold on your input stream so that the stream is ended once the user stops talking for a set duration. The closing of the stream serves the same purpose as reaching the end of an audio file, and triggers the intent matching attempt.
The solution linked above uses SoX to stream audio from an external device. The nice thing about this approach is that SoX already has options for setting audio level thresholds to start/stop the streaming process (look at the silence option), so you can fine-tune the settings to work for your needs. If you're not using NodeJS, you may need to write your own utility to handle initiating the audio stream, but hopefully this can point you in the right direction.
I've two network cameras that support RTSP, ONVIF v2.0, and a few other protocols (full list is in the link above). I want to read frames from these two streams at the same instance (or at least within a few ms) so that I can get a better view of my place by combining the information from these two images as if I were using a stereo camera pair and adding some intelligence on top of it.
So far, I've looked into RTSP and found that the RTP packet Header has this information (Source) and I found that I can use the NTP timestamp from RTCP sender reports but I'm not really sure how to use these to get absolute timestamps per frame. I'm using nodejs (rtsp-ffmpeg library) to retrieve frames from rtsp stream, I can use ONVIF but I didn't find any clear way to get the timestamp per frame or synchronize the videos to make sure I read the frames for same client timestamp with ONVIF either (ONVIF v2.6 Specs).
I'm looking to set up a server which will read from a some audio input device and serve that audio continuously to clients.
I don't need the audio to be necessarily be played by the client in real time I just want to be able for the client to start downloading from the point at which they join and then leave again.
So say the server broadcasts 30 seconds of audio data, a client could connect 5 seconds in and download 10 seconds of it (giving them 0:05 - 0:15).
Can you do this kind of partial download over TCP starting at whenever the client connects and end up with a playable audio file?
Sorry if this question is a bit too broad and not a 'how do a set variable x to y' kinda question. Let me know if there's a better forum to to post this in.
Disconnect the concepts of file and connection. They're not related. A TCP connection simply supports the reliable transfer of data. Nothing more. What your application chooses to send over that connection is its business, so you need to set your application in a way that it sends the data you want.
It sounds like what you want is a simple HTTP progressive internet radio stream, which is commonly provided by SHOUTcast and Icecast servers. I recommend Icecast to get started. The user connects, they get a small buffer of a few second in front to get them started (optional), and when they disconnect, that's it.