sonos integration with announcement system - node.js

I am currently looking into the possibility of integrating an announcement system with sonos but have yet to find a reasonable approach and have started wondering if it is currently possible at all.
My initial approach was having songs subscribe to a radio station that would send a constant stream of announcements. After testing with this setup I have been unable to get the delay below 3 seconds (which is too long).
I then began looking into the sonos API Looking at documentation and the below graph I came to the conclusion that what I was trying to achieve was however possible with sonos.
It does seem however that it will require substantial effort to implement a service where I can stream audio to sonos directly so I was hoping I could get some things cleared up before I proceed with a rather costly implementation. (time)
Is it possible to get audio delay below 3 seconds when streaming directly?
Am I correct in understanding that I will need to write an app on the sonos platform to handle my requests?
If the answer to above is no; what other options are available?

MSB,
Our current public APIs are really not well designed for what you are trying to do. There are some open source projects which are working off reverse engineered APIs that may work a little better for you.

Just use my library to play a notification.
const SonosDevice = require('#svrooij/sonos').SonosDevice
const sonos = new SonosDevice(process.env.SONOS_HOST || '192.168.96.56')
sonos.PlayNotification({
trackUri: 'https://cdn.smartersoft-group.com/various/pull-bell-short.mp3', // Can be any uri sonos understands
// trackUri: 'https://cdn.smartersoft-group.com/various/someone-at-the-door.mp3', // Cached text-to-speech file.
onlyWhenPlaying: false, // make sure that it only plays when you're listening to music. So it won't play when you're sleeping.
timeout: 10, // If the events don't work (to see when it stops playing) or if you turned on a stream, it will revert back after this amount of seconds.
volume: 15, // Set the volume for the notification (and revert back afterwards)
delayMs: 700 // Pause between commands in ms, (when sonos fails to play sort notification sounds).
})
.then(played => {
console.log('Played notification %o', played)
})
Works in node/typescript and has a lot of possibilities.

Related

What's the best way to STREAM LIVE WEBCAM to SERVER and BACK TO THE WEB?

I need some help.
What is the best way to set up LIVE STREAMING over the web from my WEBCAM to the server and back to multiple users?
Essentially I'm trying to create a group video chat application that can support many users.
I don't want it to be peer to peer webRTC.
I actually managed to make it work with getUserMedia() -> mediaRecorder -> ondataavailable -> pass blob chunks to node.js via SOCKET.IO -> socket.io sends back blob chunks to other connected users -> append those chunks to a sourceBuffer that's connected to a mediaSource that's set as the source URL on a
And it actually worked! BUT it's so slow and laggy and resource intensive. As these chunks get passed like 20 per second and it's slowing the page a lot. I don't think you're supposed to pass that many blobs to the sourceBuffer so quickly. Just for a test I tried saving mediaRecordings every 3 seconds (so it's not that resource intensive) and passing those webm blobs to the sourceBuffer but for some reason only the first webm loads, and the other ones don't get added or start playing.
It just can't work for a production app this way.
What's the "RIGHT" way to do this?
How to pass a video stream from webcam to a Node.js server properly?
And how to stream this live stream back to the web from the Node.js server so that we can have a group video chat?
I'm a bit lost. Please help.
Do I use HLS? RecordRTC?
Do I stream from Node.js via http or via socket.io?
There are services that already let you do that easily like vonage video api tokbox but those seem to be very expensive?
I want to run the video streaming through my own Node.js server that I control.
What's the best way to do this?
Please help.
Thank you
Essentially I'm trying to create a group video chat application that can support many users.
I don't want it to be peer to peer webRTC.
Video chat requires low latency, and therefore requires usage of WebRTC. Remember that one of the "peers" can actually be a server.
And it actually worked! BUT it's so slow and laggy and resource intensive.
Video encoding/decoding is resource intensive no matter how you do it. If by "slow" and "laggy" you mean high latency, then yes, recording chunks, sending chunks, decoding chunks, will have higher latency by its very nature. Additionally, what you're describing won't drop frames or dynamically adjust the encoding, so if a connection can't keep up, it's just going to buffer until it can. This is a different sort of tradeoff than what you want.
Again, for a video chat, realtime-ness is more important than quality and reliability. If that means discarding frames, resampling audio stupid-fast to catch up, encoding at low bitrates, even temporarily dropping streams entirely for a few seconds, that's what needs to happen. This is what the entire WebRTC stack does.
As these chunks get passed like 20 per second and it's slowing the page a lot. I don't think you're supposed to pass that many blobs to the sourceBuffer so quickly.
No, this is unlikely your problem. The receiving end probably just can't keep up with decoding all these streams.
Do I use HLS?
Not for anyone actively participating in the chat... people who require low latency. For everyone else, yes you can utilize HLS and DASH to give you a more affordable way to distribute your stream over existing CDNs. See this answer: https://stackoverflow.com/a/37475943/362536 Basically, scrutinize your requirements and determine if everyone is actually participating. If they aren't, move them to a cheaper streaming method than WebRTC.
RecordRTC?
No, this is irrelevant to your project and frankly I don't know why people keep using this library for anything. Maybe they have some specific use case for it I don't know about, but browsers have had built-in MediaRecorder for years.
There are services that already let you do that easily like vonage video api tokbox but those seem to be very expensive?
This is an expensive thing to do. I think you'll find that using an existing service that already has the infrastructure ready to go is going to be cheaper than doing it yourself in most cases.

Get intermediate result while streaming audio with streaming_detect_intent

I followed this example and managed to collect the audio buffers from my microphone send them to Dialogflow.
https://cloud.google.com/dialogflow-enterprise/docs/detect-intent-stream
But this processing is sequential. I first have to collect all the audio buffers that I afterwards can send to Dialogflow.
Then I get the correct result and also the intermediate results.
But only after I waited for the person to stop talking first before i could send the collected audio buffers to Dialogflow.
I would like to send (stream) the audiobuffers instantly to dialogflow, while somebody is still talking, and also get the intermediate results right away.
Does anybody know if this is possible and point me in the right direction?
My preferred language is Python.
Thanks a lot!
I got this Answer from the Dialogflow support team:
From the Dialogflow documentation: Recognition ceases when it detects
the audio's voice has stopped or paused. In this case, once a detected
intent is received, the client should close the stream and start a new
request with a new stream as needed. This means that user has to
stop/pause speaking in order for you send it to Dialogflow.
In order for Dialogflow to detect a proper intent, it has to have the
full user utterance.
If you are looking for real-time speech recognition, look into our
Speech-to-text product (https://cloud.google.com/speech-to-text/).
While trying to do something similar recently, I found that someone already had this problem and figured it out. Basically, you can feed an audio stream to DialogFlow via the streamingDetectIntent method and get intermediate results as valid language is recognized in the audio input. The tricky bit is that you need to set a threshold on your input stream so that the stream is ended once the user stops talking for a set duration. The closing of the stream serves the same purpose as reaching the end of an audio file, and triggers the intent matching attempt.
The solution linked above uses SoX to stream audio from an external device. The nice thing about this approach is that SoX already has options for setting audio level thresholds to start/stop the streaming process (look at the silence option), so you can fine-tune the settings to work for your needs. If you're not using NodeJS, you may need to write your own utility to handle initiating the audio stream, but hopefully this can point you in the right direction.

Unity3D Socket.IO Basic Movement Latency

I'm trying to make a server using with Node.JS and Socket.IO. I want to make an online game that just has simple movement event. I wrote some code, but it doesn't work very well. I have tonnes of latencies. You can check the video in below. By the way, I test it on my DigitalOcean servers or even in my LocalHost. What is the trick about my problem? I really would like to learn Network programming but I always get stuck.
Latency GAMEPLAY YouTube Link -- Especially check 13th seconds
Project GitHub link
I will explain all of my works on below. But If you want to check details codes. You can visit GitHub project link.
On the client side, I used Unity3D. If user key press to any arrow keys like upArrow, rightArrow than I send this information to the Server. By doing so the server knows which direction I would like to go.
if (Input.GetKey(KeyCode.UpArrow))
{
Movement = Vector3.up;
JSONObject data = new JSONObject();
data.AddField("x", Movement.x);
data.AddField("y", Movement.y);
socket.Emit("move", data);
}
// if Input.GetKey(KeyCode.RightArrow)
// if Input.GetKey(KeyCode.RightArrow) .. etc
On the server side, I just used Node JS and Socket.IO. I create an interval function that only sends all players to clients. This interval function fires 60 times every 1 seconds. You can see the code in below.
setInterval(function() {
io.emit('state', players);
}, 1000 / 60);
By the way, when the server receives any move event it does this:
socket.on('move', function(data) {
var player =players[socket.id] ||{};
player.x =player.x+(data.x*0.1);
player.y =player.y+(data.y*0.1);
});
You need to move also locally. So when emitting the "move" command on client side also start moving the player on client side. If you receive new information from the server, merge them. The keyword is interpolation.
First off, you do not need an interval, unless you are doing real time physic simulations on the server in node.
Second think of node.js more in an event driven manner. I.E. only send update to other players when a player makes an action nearby them.
Third use client side prediction. I.E. go ahead and move the player on the client even though the server hasn't received it yet. Then interpolate based on the last position that the server said was valid. A simple linear interpolation will work just fine if you send a time stamp from server.
Fourth, ditch socket io. Socket io is notoriously slow when it comes to websockets. If you want to use websockets then I recommend using just the node library WS. The WS library is efficient and fast. Or if you want an event driven library like socket io but based on WS library. Then you can try my custom library that I use and has been tested in multiple online based games: https://github.com/Metric/data.io. I still actively maintain it and will be pushing out an update to the c# client in a couple of weeks. The update to the c# client fixes some issues that I found while using it in a new project recently.
However, tcp would be more efficient than websockets. Websockets have an increased overhead compared to just a raw tcp or udp connection. Yes, you will still get a delay and will still need client side prediction.
For further info on networking and prediction see: http://gafferongames.com/networking-for-game-programmers/
He covers all the concepts with some code examples as well.
Also see: https://developer.valvesoftware.com/wiki/Source_Multiplayer_Networking

XMLHttpRequest detect new data from server before res.end()

Is it possible to detect new data from the server as it is sent? For example, with express.js:
res.write('Processing 14% or something');
and then display that on the page with a progress bar.
Edit:
My original question was a bit confusing so let me explain the situation. I have created a page where users can upload song files. These files are then converted (using ffmpeg) to .ogg and .mp3 files for the web. The conversion takes a long time. Is it possible to send real time data about the conversion back to the client using the same XMLHttpRequest that sent the files?
If i understand correctly you are trying to implement event based actions. Yes node.js has got some excellent web socket libraries such as socket.io and sack.js
You need to understand nodejs event driven pattern.
Websocket protocol helps maintain full duplex connection between server and client. You can notify clients when any action happens in server and similar you can notify server when any action happens in client. Libraries provide flexibility to broadcast event to all connected client or selected ones.
So it is basically emit and on that you will be using often.
Go through the documentation, it will not take much time to learn. Let me know if you need any help.

Vline - Anyway to prevent vline from "hanging up" after a certain amount of time waiting for the other user to connect?

I am testing out Vline's API currently and am basically comparing it to OpenTok. The main issue/difference I am running across is that I would like to have the ability for one side to log in to our system start transmitting audio/video and then just wait for the other party to connect at some point in the future (typically 5-10 minutes max) and have them automatically connect up. This can be done with OpenTok fairly easily - as they do not have the same concept of "calling" another user - it is really transmitting a stream to a session and users in the session can receive those transmissions.
With Vline - the problem is that after calling startMedia() if no-one answers within about 15 seconds the mediastream seems to automatically stop. Our goal is to have the first user be able to see themselves in a camera view until the other party connects and then they will see both themselves and the other party.
Is this possible with Vline?
Yes, this is possible. I went ahead and created a simple example that is available on GitHub: https://github.com/vline/vline-room-example
NOTE: We are working on a "Room API" that will make this example much simpler.

Resources