We are looking to build Google Action where it will record small snippets (like a voice TODO list) and can be played later.
Is there any documentation for this?
In short - no. Google does not provide access to the audio stream from the Assistant. You can get the Speech To Text (STT) processed by Google, however, using the Actions on Google API.
Related
I want to create a DialogFlow agent that will be Deployed on the Google Assistant that will get a Phone number from a backend service and will be able to call the number using the Google Assistant. Is it possible?
You can only play sound files or play streams. The Google Assistant also doesn't provide you with the actual sound that was recorded as it always converts the detected sound to text. This text is then delivered to your Action.
You could however call someone from your back-end using Twilio by synthesizing the text that was detected. Responding to whatever the person you're calling says would be hard as well.
I usually opt for sending text messages instead of calling when using Actions.
The platform does not support the ability to programmatically call telephone numbers through the user's phone.
Google Clouds Text-To-Speech API has a WaveNet model whose output in my opinion sounds way better than the standard speech. This model can be used in Dialogflow agents (Settings > Speech > Text To Speech), which results in the generated speech being included in the DetectIntentResponse. However, I can find no way to use this speech with the Actions-On-Google integration, i.e. in an actual Google Assistant app. Have I overlooked this, or is this really not possible, and if so, does anyone know when they plan to enable this?
In the Actions console, going to the Invocation page lets you select a TTS voice.
All of the voices can be demoed on the Languages & Locales page of the docs, and the vast majority of them use WaveNet voices.
I’m wondering how I can create a music Player for my Google Assistant compatible devices (e.g. Google Home mini, my tablet, phone...). I’ve been researching about how I can do this, but I’ve just found things like using Dialogflow, node-js and/or Actions on Google using Google Firebase Cloud Functions. I’m new to all this, I was motivated by Spotify and Pandora and all those other services. So I also tried looking up how they do it, but I found nothing. If any of you Know how to do it, please help me.
In addition to all that, I am just a tad bit confused about the whole Dialogflow and Actions on Google integration, but that’s easier to fix than the overall question.
If this isn’t “solvable” is there a way to do it with Dialogflow Fulfillment’s?
In order to create something like Spotify or Pandora, you need to partner with Google to create a media action. These are different than the conversational actions that you can create using Actions on Google and Dialogflow.
If you want to create a conversational action with Actions on Google and Dialogflow that produce long-form audio results as part of the conversation, you will want to look into the Media response, which you can include in your replies.
i'm new to action on Googles and right now doing R&D. I've created an audio skill on Alexa, and now want same for Google assistant as well. But i've few questions:
1- Can we return audio in response? my audios are about 1hour long, so can we play them in our action? In Alexa, we have audio player. Anything like that in assistant?
2- I didn't find any SDK, but devs are talking about it, so there must be some. Kindly share the link.
Thanks in anticipation.
Update:
I believe, SDK is actions-on-google. I've not explored it yet, but it's the SDK that i found for creating actions with node js
Link: actions-on-google
Actions support SSML which provides the playback of audio files: https://developers.google.com/actions/reference/ssml#support_for_ssml_elements
At the moment there is a 120 seconds maximum duration for all the audio formats supported, but you can break up the audio and play them in sequence if they are longer.
If you have your own NLU, you can use the Actions SDK. If you don't have your own NLU, then you can use API.AI to create an action.
A node.js client library is available for either of these options: https://github.com/actions-on-google/actions-on-google-nodejs
For any other developer questions, you should look at the actions documentation: https://developers.google.com/actions/develop/conversation
From the docs it seems like SpeechResponse is the only documented type of response you can return:
https://developers.google.com/actions/reference/conversation#SpeechResponse
Is it be possible to load an image or some other type of media in the assistant conversation via API.AI or the Actions SDK? Seems like this is supported with api.ai for FB, other messengers:
https://docs.api.ai/docs/rich-messages#image
Thanks!
As of today, Google Actions SDK supports Conversation Actions, by building a better Voice UI, which is integrated with Google Home.
Even API.AI integrations with Google Actions can be checked out here, which shows currently no support for images in the response.
When they provide integrations with Google Allo, then in the messaging interface, they might start supporting images, videos etc.
That feature seems to be present now. You can look it up in the docs at https://developers.google.com/actions/assistant/responses
Note: But images would be supported only on devices with a visual output. So Google Home would obviously not be able to do it. But the devices with screen do support a card with an image.
Pro Tip: Yes you can
What you want to do is represent your (image/video) as a URL within API.AI and render the URL as a (image/video) within your app
see working example