After sending a MediaResponse and starting to stream audio, if the user says "pause", the Assistant pauses the playback, is there a way to get a callback to that event for tracking playback position purposes?
That is not currently supported. The only events you can handle with a callback is FINISHED or STATUS_UNSPECIFIED.
Related
I am developing a colaborative instrument playing game, where multiple users will play an instrument (a synthesizer or sample, using the WebAudio API). On my first prototype I've set up a keyboard that sends note/volume signals via Socket.io to the server, and when the server gets that signal it sends it back to all connected sockets, which will play the corresponding note.
You might have guessed it right: there's a massive amount of lag and inconsistency as to the order of arrival of notes.
What are some efficient ways that I can send the output of WebAudio to the server, and have it broadcast to all connected users, so I have some sort of consistency?
You could try using a MediaStream by adding an MediaStreamAudioDestinationNode to your audio node graph as a destination and use that stream with either WebRTC or RecordRTC to send to your server.
Here is some info I found you could look at.
It does talk about using the getUserMedia method, but both getUserMedia and MediaStreamAudioDestinationNode methods send out a MediaStream constructor. This info
has some ideas on how you could send a MediaStream to your sever. However it does say that it needs to be recorded first. Not when it's live and running.
Sending a MediaStream to host Server with WebRTC after it is captured by getUserMedia
I hope this helps :)
Google Chromecast supports external control, such as play, pause, next, previous using both the Google Home app and an Infrared remote (over HDMI CEC).
How can these events be captured in a custom media receiver (using the CAF Receiver API) when the receiver has no media playing?
When no media is playing, the receiver is in IDLE state - that means that a sender is connected and the receiver app is loaded and running, but there is currently no playback, paused playback or buffering operation ongoing.
The messages that can now be intercepted/observed by the receiver are basically the same regardless if they have been issued by a sender app, Google home/assistant or CEC - and you can process them the same way.
If you want to implement different behavior depending on the device that send the message (or track that maybe), you can have a look at the customData section - you can set up your sender app to include some data into that, but you have no influence on how a Google Home / Google Assistant or CEC issued messages look like: CustomData will be empty here.
I'm using Google Cloud Speech API for streaming input.
I need the ability to handle the event that's fired as soon as audio is being detected. I think the appropriate one is readable.
Except, I couldn't get it to work yet:
If I attach this event to Speech API's streamingRecognize() - it'd behave the same as the data listener, except without any parameters.
If, on the other hand, I attach this event listener to node-record-lpcm16's start() - the event works as expected, but the pipe() method seems to fail executing, as Speech API's data event is never fired.
Well, according to this webpage. It's impossible.
It seems that the first event that's fired is data.
I followed this example and managed to collect the audio buffers from my microphone send them to Dialogflow.
https://cloud.google.com/dialogflow-enterprise/docs/detect-intent-stream
But this processing is sequential. I first have to collect all the audio buffers that I afterwards can send to Dialogflow.
Then I get the correct result and also the intermediate results.
But only after I waited for the person to stop talking first before i could send the collected audio buffers to Dialogflow.
I would like to send (stream) the audiobuffers instantly to dialogflow, while somebody is still talking, and also get the intermediate results right away.
Does anybody know if this is possible and point me in the right direction?
My preferred language is Python.
Thanks a lot!
I got this Answer from the Dialogflow support team:
From the Dialogflow documentation: Recognition ceases when it detects
the audio's voice has stopped or paused. In this case, once a detected
intent is received, the client should close the stream and start a new
request with a new stream as needed. This means that user has to
stop/pause speaking in order for you send it to Dialogflow.
In order for Dialogflow to detect a proper intent, it has to have the
full user utterance.
If you are looking for real-time speech recognition, look into our
Speech-to-text product (https://cloud.google.com/speech-to-text/).
While trying to do something similar recently, I found that someone already had this problem and figured it out. Basically, you can feed an audio stream to DialogFlow via the streamingDetectIntent method and get intermediate results as valid language is recognized in the audio input. The tricky bit is that you need to set a threshold on your input stream so that the stream is ended once the user stops talking for a set duration. The closing of the stream serves the same purpose as reaching the end of an audio file, and triggers the intent matching attempt.
The solution linked above uses SoX to stream audio from an external device. The nice thing about this approach is that SoX already has options for setting audio level thresholds to start/stop the streaming process (look at the silence option), so you can fine-tune the settings to work for your needs. If you're not using NodeJS, you may need to write your own utility to handle initiating the audio stream, but hopefully this can point you in the right direction.
I am using Twilio Client (VoIP) with nodejs for making a call from my device to a phone.
I want a recording functionality also with this but I don't see this API supports it.I see rest API which supports this but then it not supports VOIP.
can someone please provide a sample code or any help for this.
very sorry for a silly question but I am new to programming.
Thanks in advance.
The JavaScript Client is actually not making the recording but the TwiML does. You setup your device and establish a connection to Twilio. Audio from your device's microphone is sent to Twilio, and Twilio plays audio through your device's speakers, like on a normal phone call.
This is analogous to the way Twilio handles incoming calls from a real phone. All the same TwiML verbs and nouns that are available for handling Twilio Voice calls are also available for handling Twilio Client connections.
So assuming that you are calling a customer's number and want to record the call, you will need to pass this instruction in the TwiML, i.e:
<Response><Dial record=true>[Number to call]</Dial></Response>
Or in node.js:
resp.dial({
record:'true'
});
After the recording is complete, it gets assigned a recording SID just like recordings created via the verb, and you can fetch it via the REST API as documented here:
https://www.twilio.com/docs/api/rest/recording#list