How to program an audio/video application on network?

How to program an audio/video application on network? - linux

I want to make (for fun, challenge) a videoconference application, I have some ideas about this:
1) taking the audio/video streams (I don't know what an audio/video stream is)
2) pass this to a server that lets communicate the clients. I can figure out how to write a server(there are a lot of books and documentation about this) but I really don't know how to interact with the webcam and with the audio/video in general.
I want some links, book, suggestions about the basics of digital audio/video expecially on programming. Please help me!!!
I want to make it run on a Linux platform.

Linux makes video grabbing really nice. As long as you have a driver that outputs the video stream to the /dev/video/v* channels. All you have to do is open up a control connection to the device [an exercise for the OP] and then read in the channel like a file [given the parameters set by the control connection. Audio should be the same way, but don't quote me on it.
BTW: Video streaming from a server is a very complex issue. You have to develop or use an existing protocol. You have to be very aware of networking delays, and adjust the information sent (resize or recompress) to the client based on the link size between the client and the server.

Related

How to use `getUserMedia()` api to simulate WebRTC like behaviour?

My primary intention is to setup a VoIP session between 2 users A & B; Here the raw audio / video media bytes are fetched from A's browser are played in B's browser and vice versa.
The reason is that, when the user C & D are added into this call, we need not have to create a P2P mesh network which limits the performance.
Tried recording media with getUserMedia() and playback, but it is not real time. It also gives a bad user experience. (However, haven't experimented yet with videos of small chunks as 200 ms)
Is there any approach where I can get the raw bytes of the media and play it on other browser? Currently I have a server in between which can connect to both peers if required.
Any online examples or libraries are welcome.
Have already asked 2 questions in this regard with 100-100 bounties, but not much of use:
How to use libsrtp or similar library to decrypt/encrypt the WebRTC data stream?
How to integrate part of WebRTC as a static / dynamic library with the existing C++ code?
Related: How to stream, live video playing on my browser to browser of another user?

If i understand you well is you're looking on how to have more than two users on the session right? without using mesh topology
thats possible and configurable as well by means that some maybe active speaker or everyone is active speaker not only receiver whatever configuration you choose but to me it seems that you're asking for video conferencing
there are couple of tools for this the best one i might recommend is mediasoup its a SFU as selective fowarding unit mediasoup

I don't know if I understand correctly, but it is not likely that you will get raw video data and play it on the browser, it will just kill your bandwith and performance because the raw data is huge.
You need to use the compressed data ( media codec ex.H264 ) and you need a protocol to send and receive it. If you are looking for sub-second latency than webrtc is your best choice in here already. If you have a server in between, distribute your media through that server instead of Mesh. Check this out for webrtc network topologies:
https://antmedia.io/webrtc-servers/

Stream music from streaming platform (Deezer, Spotify, Soundcloud) to Web Audio API

Do any of you, know a way to get the audio stream of a music platform and plug it to the Web Audio API ?
I am doing a music visualizer based on the Web Audio API. It currently reads sounds from the mic of my computer and process a real-time visualization. If I play music loud enough, my viz works !
But now I'd like to move on and only read the sound coming from my computer, so that the visualization render only to the music and no other sound such as people chatting.
I know I can buffer MP3 file in that API and it would work perfectly. But in 2020, streaming music is very common, via Deezer, Spotify, Souncloud etc.
I know they all have an API but they often offer an SDK where you cannot really do more than "play" music. There is no easy access to the stream of audio data. Maybe I am wrong and that is why I ask your help.
Thanks

The way to stream music to WebAudio is to use a MediaElementAudioSourceNode or MediaStreamAudioSourceNode. However, these nodes will output zero unless you're allowed to access the data. This means you have to set the CORS property correctly on your end and also requires the server to allow the access through CORS.
A google search will help with setting up CORS. But many sites won't allow access unless you have the right permissions. Then you are out of luck.

I find a "no-code" work around. At least on Ubuntu 18.04, I am able to tell Firefox to take my speakers as the "microphone input".
You just have to select the good "mic" in the list when your browser asks for mic permission.
That solution is very convenient since I do not need to write platform-specific binding-code to access to the audio stream

What libraries/APIs allow me access real time audio waveforms of a phone call?

I am looking to build an app that needs to process incoming audio on a phone call in real time.
WebRTC allows for this but i think this works only in their browser based P2P audio communications functionality but not for phone calls/ VOIP.
Twilio and Plivo allow you record the audio for batch/later processing.
Is there a library that will give me access to the audio streams in real time? If not, what would I need to build such a service from scratch?
Thanks

If you are open to using a media server (so that the call is not longe P2P but it's mediated by the media server using a B2B model), then perhaps the Kurento Media Server may solve your problem. Kurento Media Server makes possible to create processing capabilities which are applyied in real time onto the media streams. There are many examples in the documentation of computer vision and augmented reality algorithms applied in real time over the video streams. I've never seen an only-audio processing module, but it should be simple to implement just by creating an additional module, which is not too complex if you have some knowledge about C/C++ and media processing concepts.
Disclaimer: I'm part of the Kurento development team.

video chat. red5 faster/needed?? why not just p2p?

Pardon my ignorance, but I am researching making a video chatroom, and what I am finding just seems really counter intuitive to me. From what I have read, it sounds like the standard is for each user to stream their video to a media server, like red5, and then the server sends the stream to the other person. Intuitively it seems like this just adds a middle man that would add lag to the video streaming because it has to go to a server, then turn around and go to a person, rather then just directly to a person. Why not just p2p with something like adobe status/Cirrus? Just use the service to get the other users ip, and then stream them your video directly? Yet, it seems like almost everyone uses an FMS like red5..
What am I failing to understand here? What is the advantage of having this "middle man"?

It would require lots of bandwidth (download speeds may be high enough but uploads are usually low) to send the video to the viewers. NAT makes it difficult to connect to a specific computer (from the public side there is only one IP for the computers under the router).

Live media streaming involving different kinds of devices

I am working on a project which will involve http live media streaming from a variety of devices like android phones/tablets, iphone, ipad, browser,etc. It will be a 2 way communication for all the devices with multiple devices connected to a conversation. I have implemented it partially i.e. one way by capturing audio from android phone(native app) and streaming to a web browser(HTML5 app) with a PHP server using ffmpeg and cvlc. I wanted to know of the best way to go ahead about it. Like, if there are any standards to be followed. Also what kind of a server should I be using? I don't want to use any streaming servers like Red5. I would like to implement the streaming logic similar to Http LiveStreaming by apple. I have come across MPEG-DASH that seems to be a standard for http streaming. I still have to look deeper into it. I was also thinking of using NodeJS for its popularity with streaming. Another worry was how do I go about capturing of media from devices? As in, should I use the native capability of the devices to convert media into an mp4 or any container that it supports and then stream it to the server or capture audio and images for a particular period of time and then send it to server and create a common output(I am not really sure of this idea). The separate capture is basically for simplifying the process of video streaming from the server end to any device. I was also thinking if I could completely bypass the server in any cases like a phone to phone or phone to tablet connection.
I just wanted to be sure of the things I will be using/implementing so that I wouldn't have to make drastic changes later on. Any help is deeply appreciated. Thank you.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string