Get intermediate result while streaming audio with streaming_detect_intent - dialogflow-es

I followed this example and managed to collect the audio buffers from my microphone send them to Dialogflow.
https://cloud.google.com/dialogflow-enterprise/docs/detect-intent-stream
But this processing is sequential. I first have to collect all the audio buffers that I afterwards can send to Dialogflow.
Then I get the correct result and also the intermediate results.
But only after I waited for the person to stop talking first before i could send the collected audio buffers to Dialogflow.
I would like to send (stream) the audiobuffers instantly to dialogflow, while somebody is still talking, and also get the intermediate results right away.
Does anybody know if this is possible and point me in the right direction?
My preferred language is Python.
Thanks a lot!

I got this Answer from the Dialogflow support team:
From the Dialogflow documentation: Recognition ceases when it detects
the audio's voice has stopped or paused. In this case, once a detected
intent is received, the client should close the stream and start a new
request with a new stream as needed. This means that user has to
stop/pause speaking in order for you send it to Dialogflow.
In order for Dialogflow to detect a proper intent, it has to have the
full user utterance.
If you are looking for real-time speech recognition, look into our
Speech-to-text product (https://cloud.google.com/speech-to-text/).

While trying to do something similar recently, I found that someone already had this problem and figured it out. Basically, you can feed an audio stream to DialogFlow via the streamingDetectIntent method and get intermediate results as valid language is recognized in the audio input. The tricky bit is that you need to set a threshold on your input stream so that the stream is ended once the user stops talking for a set duration. The closing of the stream serves the same purpose as reaching the end of an audio file, and triggers the intent matching attempt.
The solution linked above uses SoX to stream audio from an external device. The nice thing about this approach is that SoX already has options for setting audio level thresholds to start/stop the streaming process (look at the silence option), so you can fine-tune the settings to work for your needs. If you're not using NodeJS, you may need to write your own utility to handle initiating the audio stream, but hopefully this can point you in the right direction.

Related

Are there any ways to stream video via Redis for (near) real-time streaming?

We have a Redis server that all clients attach to for a variety of data transfer and coordination tasks. We have a new requirement that we support video streaming. I would like to avoid running a dedicated service (with all the accompanying network and security requirements that entails) and just stream over Redis.
Redis seems like a good fit for real time streaming, in particular using Redis streams. I realize that "Redis streams" have no relation to "video streaming", however, our use case follows Redis stream structure well. We want to buffer X seconds of video continuously allowing clients to attach to that real-time stream at any time. We have no need to store history or serve static video content.
Redis seems like a good solution, my problem is I don't know how to
stream an appropriate video codec (Motion JPEG maybe?) over Redis.
I wouldn't know how to join a stream mid-broadcast (join at a keyframe
perhaps?).
I wouldn't know how to serialize the stream to bytes at
the server (Python based) and de-serialize the stream to a video codec and player on
the client (a browser). Perhaps it's as simple as seralization/deseralization in opencv or equivalent and I'm just over thinking it?
These are all features I would typically look to an API to perform, but is there an API is capable of this? I'm inexperienced in the field of video streaming.
At a high level, I prefer viewing streaming as a pub-sub problem. Where producers produce chunks of information and consumers read that information on need basis.
Some solution may not be readily available, we may need to perform the following steps:
Publish:
1. chunk-id : content
2. chunk-id-fwd : (nextChunkId)
3. videoId : latestChunkId (Assuming your realtime usecase is for live streams, this can help users access 'go-live' button)
Consume:
Start:
1. Get latest chunk
2. Get content from latest chunkId
3. Get nextChunkId from chunk-id-fwd

Nodejs Audio Stream - Main Producing Client -> Server -> Multiple client listeners

I want to build an internet radio station using nodejs. My architecture needs to be like the following, there is one producer who records live audio, this live audio data needs to be sent to the server. On the server-side, I need to save the live streams of audio data in some audio format(for future playback) and also simultaneously stream the live audio to multiple clients. Can somebody please point me towards some implementations or library available to achieve this? I have read several posts and stackoverflow answers but couldn't find anything related to my need.
Is webRTC needed? I don't want clients to get peer to peer connection as I also want to save the live audio on the server. Please any help would be appreciated.

What's the best way to STREAM LIVE WEBCAM to SERVER and BACK TO THE WEB?

I need some help.
What is the best way to set up LIVE STREAMING over the web from my WEBCAM to the server and back to multiple users?
Essentially I'm trying to create a group video chat application that can support many users.
I don't want it to be peer to peer webRTC.
I actually managed to make it work with getUserMedia() -> mediaRecorder -> ondataavailable -> pass blob chunks to node.js via SOCKET.IO -> socket.io sends back blob chunks to other connected users -> append those chunks to a sourceBuffer that's connected to a mediaSource that's set as the source URL on a
And it actually worked! BUT it's so slow and laggy and resource intensive. As these chunks get passed like 20 per second and it's slowing the page a lot. I don't think you're supposed to pass that many blobs to the sourceBuffer so quickly. Just for a test I tried saving mediaRecordings every 3 seconds (so it's not that resource intensive) and passing those webm blobs to the sourceBuffer but for some reason only the first webm loads, and the other ones don't get added or start playing.
It just can't work for a production app this way.
What's the "RIGHT" way to do this?
How to pass a video stream from webcam to a Node.js server properly?
And how to stream this live stream back to the web from the Node.js server so that we can have a group video chat?
I'm a bit lost. Please help.
Do I use HLS? RecordRTC?
Do I stream from Node.js via http or via socket.io?
There are services that already let you do that easily like vonage video api tokbox but those seem to be very expensive?
I want to run the video streaming through my own Node.js server that I control.
What's the best way to do this?
Please help.
Thank you
Essentially I'm trying to create a group video chat application that can support many users.
I don't want it to be peer to peer webRTC.
Video chat requires low latency, and therefore requires usage of WebRTC. Remember that one of the "peers" can actually be a server.
And it actually worked! BUT it's so slow and laggy and resource intensive.
Video encoding/decoding is resource intensive no matter how you do it. If by "slow" and "laggy" you mean high latency, then yes, recording chunks, sending chunks, decoding chunks, will have higher latency by its very nature. Additionally, what you're describing won't drop frames or dynamically adjust the encoding, so if a connection can't keep up, it's just going to buffer until it can. This is a different sort of tradeoff than what you want.
Again, for a video chat, realtime-ness is more important than quality and reliability. If that means discarding frames, resampling audio stupid-fast to catch up, encoding at low bitrates, even temporarily dropping streams entirely for a few seconds, that's what needs to happen. This is what the entire WebRTC stack does.
As these chunks get passed like 20 per second and it's slowing the page a lot. I don't think you're supposed to pass that many blobs to the sourceBuffer so quickly.
No, this is unlikely your problem. The receiving end probably just can't keep up with decoding all these streams.
Do I use HLS?
Not for anyone actively participating in the chat... people who require low latency. For everyone else, yes you can utilize HLS and DASH to give you a more affordable way to distribute your stream over existing CDNs. See this answer: https://stackoverflow.com/a/37475943/362536 Basically, scrutinize your requirements and determine if everyone is actually participating. If they aren't, move them to a cheaper streaming method than WebRTC.
RecordRTC?
No, this is irrelevant to your project and frankly I don't know why people keep using this library for anything. Maybe they have some specific use case for it I don't know about, but browsers have had built-in MediaRecorder for years.
There are services that already let you do that easily like vonage video api tokbox but those seem to be very expensive?
This is an expensive thing to do. I think you'll find that using an existing service that already has the infrastructure ready to go is going to be cheaper than doing it yourself in most cases.

Recommendation on client side echo cancellations

I am developing a MCU based voip service. I think the traditional way of doing MCU is, you have N audio mixers at server and every participant in the call receive a steam that does not have their own voice encoded.
Guess what I wish to do is, have only 1 audio mixer running at server and (on a broadcast kind model) send the final mixer audio to every participant (For scalability obviously).
Now this obviously creates a problem of hearing your own voice coming from speaker as MCU’s output stream.
I am wondering if there is any “client side echo cancellation” project that I can use to cancel the voice of user at desktop/mobile level.
The general approach is to filter/subtract the own voice in the MCU. Doing this on the client side does not work.

sonos integration with announcement system

I am currently looking into the possibility of integrating an announcement system with sonos but have yet to find a reasonable approach and have started wondering if it is currently possible at all.
My initial approach was having songs subscribe to a radio station that would send a constant stream of announcements. After testing with this setup I have been unable to get the delay below 3 seconds (which is too long).
I then began looking into the sonos API Looking at documentation and the below graph I came to the conclusion that what I was trying to achieve was however possible with sonos.
It does seem however that it will require substantial effort to implement a service where I can stream audio to sonos directly so I was hoping I could get some things cleared up before I proceed with a rather costly implementation. (time)
Is it possible to get audio delay below 3 seconds when streaming directly?
Am I correct in understanding that I will need to write an app on the sonos platform to handle my requests?
If the answer to above is no; what other options are available?
MSB,
Our current public APIs are really not well designed for what you are trying to do. There are some open source projects which are working off reverse engineered APIs that may work a little better for you.
Just use my library to play a notification.
const SonosDevice = require('#svrooij/sonos').SonosDevice
const sonos = new SonosDevice(process.env.SONOS_HOST || '192.168.96.56')
sonos.PlayNotification({
trackUri: 'https://cdn.smartersoft-group.com/various/pull-bell-short.mp3', // Can be any uri sonos understands
// trackUri: 'https://cdn.smartersoft-group.com/various/someone-at-the-door.mp3', // Cached text-to-speech file.
onlyWhenPlaying: false, // make sure that it only plays when you're listening to music. So it won't play when you're sleeping.
timeout: 10, // If the events don't work (to see when it stops playing) or if you turned on a stream, it will revert back after this amount of seconds.
volume: 15, // Set the volume for the notification (and revert back afterwards)
delayMs: 700 // Pause between commands in ms, (when sonos fails to play sort notification sounds).
})
.then(played => {
console.log('Played notification %o', played)
})
Works in node/typescript and has a lot of possibilities.

Resources