If possible, how can I allow the user to record a voice when interacting with my app (or action), and then allowing me to perform some actions on that voice record (Convert Speech To Text for example).
I know that I can specify parameters and extract some information, but what if I want to capture everything that the user have said, or just allow the user to record a message in any language and then perform whatever I want to do on on it.
Developers only get the user transcription, not the actual audio that was said.
Related
Hi I'm not a coder and need guidance.
I'm creating a simple skill for Google Assistant in Dialogflow where the goal is to get a user's email. However, when I test it out verbally in Google actions console most of the time it picks up the wrong email address, (I'll say nhs.com and it thinks I'm saying something different) even though I have put example emails in the entities bit.
What is the solution around this? Is it possible to ask permission in Dialogflow to get a users data? I think Google Assistant says no you can only do that (account linking) if you build in Google Assistant? Can you ask the user to verbally spell out their email address, although no idea how you would go about doing that.
It is not recommended to ask the user for their email. Emails can have a very difficult structure consisting of characters and numbers. Because of this Google provides you with the option to retrieve the users details via accountlinking. I've listed some options for retrieving an email.
1) Google Sign-in (Requires Code)
Since you said you aren't a coder it will be a bit challenging to get the user's email easily. Your best option would be to use Google Sign-in accountlinking. This provides your bot with a flow that asks the user permission to use their email automatically.
To be able to use this code, you might have to use some code since I do not know if Dialogflow supports retrieving the user email from the webpage when using accountlinking.
The benefit of Google Sign-in is that you will get the active email that is in their Google profile.
2) Regex entity (Requires some technical knowledge about regex)
Dialogflow supports a feature called Regex Entities. With these entities you can provide a regex which will look through the user input for a pattern. If the user input matched the pattern it will take this from the user input. In your case you would need a regex to check for an email pattern.
With a regex entity the user can be prompted to tell their email. With this you approach you won't be certain if it actually is their real email and you might have to add a flow to double check if there weren't any typos in the email.
3) Email entity (Least technical option)
As Rally mentioned in the comments, Dialogflow also supports an email entity. This can be used to automatically detect an email in your user's input. Though it is an easy option to use, I've noticed that it doesn't always detect every email and since you can't improve it's behavior, it might not be the best choice. It definitely is the least technical option, but it might not always work.
So I was using Gmail Google Action on my Assistant device and One thing that I found particularly intriguing was that when a user has given its message and email, after that assistant gives options for change or adding the message. That means on selecting those options, assistant will change the value of message entity or edit it as per user command, without re-enabling the whole intent.
My question is how can I implement this functionality in my Google Action. Is there any particular function made by Google that I can use or do I have to create one from scratch ?
Gmail's integration is a bit different than the way a third-party developer would do it, but you can save user input from the current session and modify it later on in your conversation. In your Action, this wouldn't be just one intent, but a few that would handle the work of modifying some session data and finally using that data to complete the user's original action.
I need to gather the name and mob number of the user before he starts interacting with the Chatbot. How can I do that?
You will need to create 2 parameters - name and phoneNumber to capture name and phone numbers.
Using the #sys.phone-number you can capture phone numbers from the user.
Using the #sys.given-name or #sys.last-name you can capture user's name.
Please note: currently only US English names are supported by the system entities in Dialogflow. So there may be some cases where you may not be able to capture all the names.
gather user information like name and mobile number before Chatbot
In your case, you need to use events
Step 1. trigger an event when your app initialize like from nodejs server and angular
Events are another way to trigger the intent without user interaction
when you intent trigger you can ask information
Twilio provides some documentation that explains how to create interactive voice experiences, for example, how to prompt for key-press from the caller and offer different menus or perform actions based on it.
However I cannot find any information on how I might be able to fetch data from a third-party service based on user input.
For example, suppose a user enters his zipcode into the keypad, I would like to fetch the weather from a weather API and return it to the user in speech form.
Is this possible? And if yes, how?
Very possible. You can take a look at the documentation below but the key widget is the HTTP Request Widget.
Studio Widget Library
https://www.twilio.com/docs/studio/widget-library#http-request
The relevant line is:
"JSON: If your HTTP Widget returns valid JSON, you should be able to access it via widgets.MY_WIDGET_NAME.parsed" variable.
Studio User Guide - Working with Variables
https://www.twilio.com/docs/studio/user-guide#working-with-variables
We have a framework that implements chatbot / voice assistant logic for handling complex conversations in the health domain. Everything is implemented on our server side. This gives us full control of how responses are generated.
The channel (such as Alexa or Facebook Messenger cloud) calls our webhook:
When user messages, the platform sends these to our webhook: hashed user id, message text (chat message or transcribed voice)
Our webhook responds with the appropriately structured response, which includes text to be displayed, spoken, possibly choice buttons and some images etc. It also includes a flag whether the current session has finished or user input is expected.
Integrating a new channel involves conversion of the response returned into the form expected by a channel and setting some flags (has voice, has display etc.).
This simple framework has worked so far for Facebook Messenger, Cortana, Alexa (a little bit of hacking was needed to abandon it's intent and slot recognition), our web chatbot.
We wanted to write a thin layer of support for Google Assistant action.
Is there any way of passing all the input from Assistant user intact into a webhook such as the one described above and taking full control of the way responses are generated and the end of conversation is determined?
I'd rather not delve into those cumbersome ways of API.AI of structuring a conversation which seems good for a trivial scenarios such as ordering an Uber but seems very bad for longer conversation.
Since you already have a Natural Language Understanding layer for your system, you don't need API.AI/Dialogflow, and you can skip this layer completely. (The NLU is useful, even for large and extensive conversations, but doesn't make sense in your case where you've already defined the conversation through other means.)
You'll need to use the Actions SDK (sometimes known as actions.json after the configuration file it uses) to define triggering phrases, but after that you'll get all the text that the user says as part of your conversation through a webhook that delivers JSON to you. You'll reply with JSON that contains the text/audio response, images on cards, possibly suggestion chips, etc.