How to increase dialogflow microphone audio input? - dialogflow-es

I am using Google Dialogflow ES to implement in my PHP project. Right now dialogflow accepts audio input around 8-10 seconds. I want to increase this limit, i.e. dialogflow should accept audio input of minimum 30 seconds. How to configure this?

Related

High latency when streaming to google cloud speech on first audio result only

I have google cloud speech setup on a raspberry pi for a voice assistant and while the entire system is working I am facing an unusual problem.
When I just boot up the system and send a audio stream to google cloud for real time processing it returns the result after about 15-20 seconds. The subsequent audio requests sent are returning within 1-2 seconds. This behaviour also reappears if the system is already running and I don't make a voice request for ~5-10 mins.
I'm using the node.js SDK. Here is my config file:
config:{
encoding: "LINEAR16",
sampleRateHertz: 16000,
languageCode: 'en-us'
},
singleUtterance: true,
interimResults: false
I don't send more than 4 seconds of audio at a time as the stream is forcibly closed if it exceeds this time.
I am able to consistently reproduce this issue but can't seem to understand why only the first query when the system is booted up or is inactive for a while takes so long to return a result.
Any ideas on how to approach debugging this?
Edit: I only seem to be facing this issue on the raspberry pi. There is no such first query delay on my mac.

Azure 4k live streaming

The situation is: i want to send live RTMP, 4k60fps 80Mb/s stream from my computer to Live Azure Encoder and later stream to YouTube/Facebook etc. In specs i found, that Live Azure Encoder can receive up to FHD30fps. Is there any possibility to send stream with my parameters mentioned above?
If no, can I achieve my goal using different Azure components?
Greetings,
Konrad
Currently, the live encoder in Azure Media Services does not support 4k ingest. You can send 4Kp60 streams to our pass-through LiveEvent, though - either with H.264 codec or H.265/HEVC. You will need an encoder (eg. MediaExcel) that generates a multiple bitrate 4K/HEVC feed and sends it to our services. Please contact us directly via amshelp#microsoft.com if you want to discuss options.

Get intermediate result while streaming audio with streaming_detect_intent

I followed this example and managed to collect the audio buffers from my microphone send them to Dialogflow.
https://cloud.google.com/dialogflow-enterprise/docs/detect-intent-stream
But this processing is sequential. I first have to collect all the audio buffers that I afterwards can send to Dialogflow.
Then I get the correct result and also the intermediate results.
But only after I waited for the person to stop talking first before i could send the collected audio buffers to Dialogflow.
I would like to send (stream) the audiobuffers instantly to dialogflow, while somebody is still talking, and also get the intermediate results right away.
Does anybody know if this is possible and point me in the right direction?
My preferred language is Python.
Thanks a lot!
I got this Answer from the Dialogflow support team:
From the Dialogflow documentation: Recognition ceases when it detects
the audio's voice has stopped or paused. In this case, once a detected
intent is received, the client should close the stream and start a new
request with a new stream as needed. This means that user has to
stop/pause speaking in order for you send it to Dialogflow.
In order for Dialogflow to detect a proper intent, it has to have the
full user utterance.
If you are looking for real-time speech recognition, look into our
Speech-to-text product (https://cloud.google.com/speech-to-text/).
While trying to do something similar recently, I found that someone already had this problem and figured it out. Basically, you can feed an audio stream to DialogFlow via the streamingDetectIntent method and get intermediate results as valid language is recognized in the audio input. The tricky bit is that you need to set a threshold on your input stream so that the stream is ended once the user stops talking for a set duration. The closing of the stream serves the same purpose as reaching the end of an audio file, and triggers the intent matching attempt.
The solution linked above uses SoX to stream audio from an external device. The nice thing about this approach is that SoX already has options for setting audio level thresholds to start/stop the streaming process (look at the silence option), so you can fine-tune the settings to work for your needs. If you're not using NodeJS, you may need to write your own utility to handle initiating the audio stream, but hopefully this can point you in the right direction.

Continuous audio download stream

I'm looking to set up a server which will read from a some audio input device and serve that audio continuously to clients.
I don't need the audio to be necessarily be played by the client in real time I just want to be able for the client to start downloading from the point at which they join and then leave again.
So say the server broadcasts 30 seconds of audio data, a client could connect 5 seconds in and download 10 seconds of it (giving them 0:05 - 0:15).
Can you do this kind of partial download over TCP starting at whenever the client connects and end up with a playable audio file?
Sorry if this question is a bit too broad and not a 'how do a set variable x to y' kinda question. Let me know if there's a better forum to to post this in.
Disconnect the concepts of file and connection. They're not related. A TCP connection simply supports the reliable transfer of data. Nothing more. What your application chooses to send over that connection is its business, so you need to set your application in a way that it sends the data you want.
It sounds like what you want is a simple HTTP progressive internet radio stream, which is commonly provided by SHOUTcast and Icecast servers. I recommend Icecast to get started. The user connects, they get a small buffer of a few second in front to get them started (optional), and when they disconnect, that's it.

sonos integration with announcement system

I am currently looking into the possibility of integrating an announcement system with sonos but have yet to find a reasonable approach and have started wondering if it is currently possible at all.
My initial approach was having songs subscribe to a radio station that would send a constant stream of announcements. After testing with this setup I have been unable to get the delay below 3 seconds (which is too long).
I then began looking into the sonos API Looking at documentation and the below graph I came to the conclusion that what I was trying to achieve was however possible with sonos.
It does seem however that it will require substantial effort to implement a service where I can stream audio to sonos directly so I was hoping I could get some things cleared up before I proceed with a rather costly implementation. (time)
Is it possible to get audio delay below 3 seconds when streaming directly?
Am I correct in understanding that I will need to write an app on the sonos platform to handle my requests?
If the answer to above is no; what other options are available?
MSB,
Our current public APIs are really not well designed for what you are trying to do. There are some open source projects which are working off reverse engineered APIs that may work a little better for you.
Just use my library to play a notification.
const SonosDevice = require('#svrooij/sonos').SonosDevice
const sonos = new SonosDevice(process.env.SONOS_HOST || '192.168.96.56')
sonos.PlayNotification({
trackUri: 'https://cdn.smartersoft-group.com/various/pull-bell-short.mp3', // Can be any uri sonos understands
// trackUri: 'https://cdn.smartersoft-group.com/various/someone-at-the-door.mp3', // Cached text-to-speech file.
onlyWhenPlaying: false, // make sure that it only plays when you're listening to music. So it won't play when you're sleeping.
timeout: 10, // If the events don't work (to see when it stops playing) or if you turned on a stream, it will revert back after this amount of seconds.
volume: 15, // Set the volume for the notification (and revert back afterwards)
delayMs: 700 // Pause between commands in ms, (when sonos fails to play sort notification sounds).
})
.then(played => {
console.log('Played notification %o', played)
})
Works in node/typescript and has a lot of possibilities.

Resources