Twilio - Detect when user starts talking - node.js

I've integrated Twilio in a NodeJS application and I want to know if there is any way to detect using TwiML when a user starts talking while in the middle of a playback or when the synthesizer is talking? Like when you call to an IVR and the IVR is talking to you and you already know what to say and instead of waiting for it to finish you choose the option right away.

Currently, Twilio does not support speech or voice recognition.
Source: https://www.twilio.com/help/faq/voice/does-twilio-support-speech-recognition
If
choose the option right away
means a key press, then you don't have to wait for the playback to finish.

Related

Add music to dialogflow

I am trying to add music into my dialogflow agent. I don't want to add it from the dialogflow console and I wish to add it from the webhook. Can you please tell me how to add the music from the webhook. I am trying this code but it's not working:
app.intent('Music', (conv) => {
var speech = '<speak><audio src="soundbank://soundlibrary/ui/gameshow/amzn_ui_sfx_gameshow_countdown_loop_32s_full_01"/>Did not get the audio file<speak>';
});
Also I want to use one interrupt keyword which will stop this music, is there any predefined way or if user defined, how to interrupt the music and proceed with my other code?
Firstly, to be able to add music, it needs to be hosted on a publicly accesible https endpoint, see the docs. So make sure you can access your file even when using a private browsing mode such as incognito on chrome.
Secondly, If you choose to use SSML to play your audio, the audio will become part of the speech response. By doing this, you won't be able to create any custom interruptions or control over the music. The user can only stop the music by stopping your action or saying "Okay Google" again to interrupt your response.
If you would want to allow your users to control the music you send to them, try having a look at the media responses within the actions on google library.

Is it possible to wait until the ask method returns a response

Using actions-on-google, when handling an intent and then using conv.ask() to send a response to the agent, is it possible to wait until the request has been successfully sent and then continue doing something else? Is there a way to await the response of the ask method?
My idea is to tell the agent to say something, manually time playing a sound (mp3) after the ask method has been successfully sent to the agent. Right now it takes a bit of time for the agent to receive the request, say the thing and then play the sound. The request gets sent, but not received instantly so the sound that I am playing plays way before the agent has said something.
Is that something possible?
Update
Right now I'm using SSML to make two different voices speak in one intent. The idea of it is that we have two "personalities" talking, and each personality has a different voice. Currently, in SSML using some attributes to do that. Let's call them P1 and P2. P1 starts by saying something, and as soon as it ends a sound of a blender gets played. Right after the sound is playing the second personality P2 starts talking, and then P1 then "replies" to it, but that all happens in one intent response. That's the idea I'm trying to implement.
If you want to play audio immediately after saying something, it more sounds like you want to use a Media response as part of what you're sending back. Your mp3 file must be available at an HTTPS address, although that address can be anything you want as long as the device can resolve it. Since it will be on the same server the webhook is running on, and the webhook has to have a public HTTPS URL, then presumably the audio will (or can) as well.
If your interest is in knowing that latency, you can probably time the difference between when you send the response and when the device requests the mp3 file.
There is no direct way to know when the Assistant has finished saying the text, but you can use tricks with the Media response to get some idea depending on your needs.
Update based on your use case.
If you're doing it all as one response, and it fits in that response, and your audio is just a few seconds long, then you can do it using SSML as a single response. That part seems fine.
If the audio is longer or you want more of a back and forth between your personalities, then you can use the Media response to play the audio (even a very short empty audio). At the end of the audio playing, it sends an event to the Action and you can then continue to the next step in your personalities responding.

How to answer a user in Home device and send a notification to user's phone

Is there any way to answer a user question in the Home device and at the same time, send a notification to another device?
For example, if the user ask for a direction, answer with the location (voice) and finish saying to the user:"I've sent you the location to your phone" and send him a map...
I know there is a way to switch conversation to another device (say, for example, one with screen) but I don't want to finish it in Home device.
Thanks in advance.
Not in a straightforward way, no, but there are a few options that you have that may do what you want.
You can try to use Assistant Notifications. Right now, notifications only appear on mobile devices, but even if/when they allow speaker notifications in the future, your user could still open it on a mobile device. You need to ask for permission to send a notification, and when they trigger the notification, an Intent in your Action will be triggered to actually show what you want to show.
You can also look into using a more standard notification channel such as Firebase Cloud Messaging. This does require you to have your own app on the mobile device, and it works outside of the Assistant, but may be a good choice if it meets your needs.

How to know when the user changes the song with the Spotify API

I need to know when the user changes the song that is "currently playing".
Currently, I'm using https://api.spotify.com/v1/me/player/currently-playing to get the information about the song that is "currently playing". But, I need to know when it changes to the next one (not only because the song finished, but also when the user press NEXT SONG button).
My current workaround is to call the https://api.spotify.com/v1/me/player/currently-playing endpoint every second, but I'll be out of the rate limit if I do it very often.
You are doing it right. You need to poll the https://api.spotify.com/v1/me/player/currently-playing endpoint to detect changes in the playback state.
In some scenarios it can be suitable to use Spotify's web playback SDK, which exposes a player_state_changed event. For this to work the user needs to have a premium account and the playback needs to happen on the device created by the SDK.

Twilio Real Time Recording

I am using Twilio voice call
and
want to get the voice data of other person(in voice call) at REAL TIME. So that i can convert it into text etc.
Please let me know how to achieve this.
I know twilio has a record call feature but
1) It gives url at the end of call not at real time
2) i think it will record the whole conversation (of both persons)
I am thinking of using node.js but not able to find solutions.
Twilio developer evangelist here.
I'm afraid our programmable voice feature doesn't support streaming the audio in real time in any way other than between the callers.
You can, however, use SIP with Twilio. You would need to provide your own PBX to make this work, but it then gives you access to the streaming audio, which you can then work with yourself.
Let me know if this helps at all?

Resources