Hi would like to know if for example my own audioTrack is muted and I started speaking while muted it can return an event, this will be similar to teams to tell you that you are muted.
Probably the general question if we are able to track AudioEvents while speaking? Because I believe that dominant speaker is the only audio speaking event I see on Twilio. Any hints in obtaining the audio speaking event would be great.
Twilio developer evangelist here.
It sounds like you are using Twilio Video (since you mention dominant speaker events). Twilio Video itself doesn't have "audio speaking" events, neither does the web platform itself.
You can however do some audio analysis in the browser to tell whether a person is making noise and you can compare that to whether their audio track is currently enabled in order to show a warning that they are speaking while muted.
To do so, you would need to access the localParticipant's audio track. From that you can get the underlying mediaStreamTrack, turn it into a MediaStream and then pass it to the web audio API for analysis. I have an example of doing this to show the volume of localParticipant's audio here: https://github.com/philnash/phism/blob/main/client/src/lib/volume-meter.js.
Once you have that volume you can then choose a threshold where you decide a user is trying to speak and then compare whether that threshold is broken while the user is muted.
Let me know if that helps.
Related
We have been testing the voice message feature and some users reported they were not happy with the sound volume, in other words, they could not hear very well what was read to them.
My simple question is:
is it possible to control the sound volume of a Twilio voice message read to the called person?
Do any of you, know a way to get the audio stream of a music platform and plug it to the Web Audio API ?
I am doing a music visualizer based on the Web Audio API. It currently reads sounds from the mic of my computer and process a real-time visualization. If I play music loud enough, my viz works !
But now I'd like to move on and only read the sound coming from my computer, so that the visualization render only to the music and no other sound such as people chatting.
I know I can buffer MP3 file in that API and it would work perfectly. But in 2020, streaming music is very common, via Deezer, Spotify, Souncloud etc.
I know they all have an API but they often offer an SDK where you cannot really do more than "play" music. There is no easy access to the stream of audio data. Maybe I am wrong and that is why I ask your help.
Thanks
The way to stream music to WebAudio is to use a MediaElementAudioSourceNode or MediaStreamAudioSourceNode. However, these nodes will output zero unless you're allowed to access the data. This means you have to set the CORS property correctly on your end and also requires the server to allow the access through CORS.
A google search will help with setting up CORS. But many sites won't allow access unless you have the right permissions. Then you are out of luck.
I find a "no-code" work around. At least on Ubuntu 18.04, I am able to tell Firefox to take my speakers as the "microphone input".
You just have to select the good "mic" in the list when your browser asks for mic permission.
That solution is very convenient since I do not need to write platform-specific binding-code to access to the audio stream
I'm trying to migrate a music player Alexa Skill to Google Home. But I cannot find a pre-built music playback (Actions or DialogFlow)... I want to reproduce streaming music using my own music server (not from Spotify or Google music).
I found a couple of examples using buildRichResponse and/or MediaObject, but these are not exactly a playback service.
Does anyone know if google home has a multimedia playback or a way to do it easily?
Thx
The Assistant's Media response is the nearest parallel to Alexa's AudioPlayer, although there are clearly differences between the two:
Alexa's playback is done outside the context of a session / conversation. So once you start a playback, you only have the playback controls available. Assistant Media controls are part of a conversation, so you can fully handle anything the user might say.
One consequence of this is that Alexa treats the playback as the result of the skill, while the Assistant treats it as part of the Action.
Google only sends an event when media playback has finished, and doesn't give any indication of why it has finished. Alexa reports more of the controls and has more events describing the state of the playback.
This makes it fairly easy to "queue up" the next audio for Alexa, but that brings additional complexity for how to handle when the queue ends up being wrong at the last moment. The Assistant doesn't have any way to queue the next audio, so there ends up being a gap between the audio ending and the next beginning while the event is handled on the server.
Although the approaches are slightly different, both seem to offer a basic long-audio playback service.
This doesn't sound like what you are trying to do, but if you are looking for something slightly more static, you can also look at the content actions that Google supports.
See https://github.com/Limag/aiplayer/ for an example to play self hosted MP3s. Unfortunately, even changing the volume will not be recognized. And it seems there is no way to add this.
If you use Google Play Music, you can upload MP3s with a tiny helper application, provided by Google. Google Play Music works well, but has some other disadvantages. E.g. it is unusable for audio books, all playlists starting always from the beginning.
I'd like to be able to capture the audio from the audio card of my computer and to dispatch it with WebRTC. However, I am not sure if it's possible or not to have access to the audio directly produced by my computer.
According to this repo https://github.com/niklasenbom/RecordingApp/blob/master/app.js there is a system audio stuff but not sure if it's what I'm looking for.
Thanks,
You can do it by using NAudio. Actually I did the same project myself and will put it in GitHub in a few weeks and update this answer. You can configure the frequency etc. and use it's OnDataAvailable event to dispatch the sound to registered clients.
I am working on a project for large group broadcasting in WebRTC since it needs to work on iOS and Android devices, I am using Kurento, and iOSWEBRTC cordvoa plugin to build this I am curious if anyone can help improve my plan, or if there is a easier way to achieve this.
We need to have a video/audio conference with 5 people per room, however we need to be able to show that video to large audiences. Now my idea would be use Kurento as a middle-man and capture the streams into .webm files for live playback as the conference is going on.
Is there a better way to achieve this? And how would I playback the webm file as it is being recorded, it needs to update and continue playing as more video is sent, basically a live stream copy of the camera.
I am unsure if I am going the best route but I figured that would reduce the bandwidth from my original idea, I originally was thinking of making it like this:
5 person conference for broadcasters X number of viewers then downloaded those streams however I realize the upload bandwidth requirement would be crazy high, that is why I settled on this idea. Additionally the viewers do not have to see real time like the broadcasters. They need to be able to see and communicate with each other at the same time and the viewers can be a few seconds behind.
TL;DR:
Trying to make a 5 person video conference with video/audio capturing to then live stream it to viewers players. This would allow avoiding of PeerConnection bandwidth limitations. Would this work or am I forgetting something?
You'll need to look into using an SFU or MCU. An MCU is very costly, but multiplexes video streams and sends down a single video stream to all peers, and can also record that stream. An SFU is a single point of receipt of all streams, and selectively forwards them to clients. It could record off individual streams and then you could do post-processing to make a single recording out of the multiple recorded streams. A mesh network of connections really doesn't work for this use case.