Unless I've done something majorly stupid, it appears I only have one entry point into my Action on Google using Actions SDK and Node.js.
Consequently, I have to work out what the user has said by using some keywords with .indexOf() and then calling the appropriate function.
I thought that would also be simpler and there would be a way I could define an action with several phrases and Google would be intelligent enough to work it all out, even if the user said something slightly differently.
I guess one of the things Im doing wrong/different, is just by having a welcome intent that essentially has a conversation and asks "What would you like to do?" then the user responds, then I have to work out what was said, and follow up an appropriate action.
That seems quite long winded. Any better ways?
The "better way" is to use a tool that is designed for that and has a powerful and flexible Natural Language Processing engine associated with it. Actions directly support both Dialogflow and Converse.AI, and most other NLP engines should be able to provide information about how they work with Actions.
Dialogflow, for example, lets you specify some sample phrases that will meet an Intent, and then supplements that with "similar" phrases to the ones you've specified. Your Node.js webhook gets told which Intent was called, with what parameters you've specified for that Intent, and you can take action based on that information directly.
At this point, the Actions SDK is mostly intended to be used as the base that these and other NLP engines build on top of.
Related
I'm trying to ascertain the "right tool for the job" here, and I believe Cognitive Services can do this but without disappearing down an R&D rabbit-hole I thought I'd make sure I was tunnelling in the right direction first.
So, here is the brief:
I have a collection of known existing phrases which I want to look for, but these might be written in slightly different ways, be that grammar or language.
I want to be able to parse a (potentially large) volume of text to scan and look for those phrases so that I can identify them.
For example, my phrase could be "the event will be in person" but that also needs to identify different uses of language; for example "in-person event", "face to face event", or "on-site event" - as well as the various synonyms and variations you can get with such things.
LUIS initially appeared to be the go-to tool for this kind of thing, and includes the ability to write your own Features (aka Phrase Lists) to augment the model, but it isn't clear whether that would hit the brief - LUIS appears to be much more about "intent" and user interaction (for example building a chat Bot, or understanding intent from emails).
Text Analytics also seems a likely candidate, but again seems more focused about identifying "entities" (such as people / places / organisations) rather than a natural language "phrase" - would this tool work if I was defining my own "Topics" or is that really just barking up the wrong tree?
.. or ... is there actually something else I should be looking at completely different?
At this point - I'm really looking for a "which tool should I spend lots of time learning about".
Thanks all in advance - I appreciate this is a fairly open-ended requirement.
It seems your scenario aligns more with our text analytics service. I was going to recommend Key Phrase Extraction API which evaluates unstructured text and returns a list of key phrases. However, since you require to use known (custom) phrase list, it may not be the solution you're looking for. We currently don't support custom key phrase extraction today, however it's on our roadmap. If interested, we can connect you with the product team to learn more about your scenario.
Updated:
Please try custom NER capability.
I am working on a cooking recipe app for google home and I need a way to string several GoogleResponses (SimpleResponse etc..) together without requiring user interaction between them.
I have searched for other answers pertaining to this, and while I have found a few similar questions to mine, the replies tend to be along the lines of "the system was designed for dialogues so what would be the point?".
I fully understand this point of view, however because of the nature and behaviour requirements of the app that I am developing I find myself in need of this particular possibility.
The recipes are divided into steps (revolutionary, I know..) and there is roughly a 1 to 1 correspondence between steps and GoogleResponses.
To give an example of how a typical recipe unfolds it is usually like this (this is a simplification of course):
main content -> question -> main content -> question -> etc..
With each instance of "main content" being a step of the recipe and each "question" requiring user input.
If if was just like this all the time then there would not be a problem, I could just bundle each "main content -> question" section into one GoogleResponse and be done.
However there are often times where the recipe flows more like:
main content -> main content -> main content -> question
With each "main content" being a step in the recipe, it does not make sense in this context to bundle them together into the same response (there is a system for the user to move back and forth between steps).
I was originally using MediaResponses for the "main content" sections as those do not require user input to move onto the next step, but due to various reasons I won't go into here as this is already getting quite long, the project manager has decided that MediaResponses should not be used in this project.
The short answer is the one you already encountered - trying to make conversational actions not-so-conversational doesn't work very well. However, there are a few things you can look into.
Recipe Structured Data
Since you're working on a recipe action, specifically, it may be worthwhile to use the standard recipe support that comes with the Assistant.
On the upside - people will be familiar with it, and you don't need to do much code, just provide markup on a webpage.
On the downside - if you have other requirements for how you want the interaction to go, it isn't that flexible. (For example, if you're asking questions at some of the recipe points, or if you want to offer measurement adjustments based on number of people to serve.)
Misuse the "No Input" event
You can configured dynamic reprompts so you get an event if the user doesn't say anything after a few seconds. If they want to speed a reply, they could ask for the next context specifically, or you can catch the actions_intent_NO_INPUT event in Dialogflow and advance yourself.
There are a few downsides here:
Not all devices support no-input. In particular, for example, mobile devices won't generate this.
This may only be valid for two no-input events in a row. On the third event, the Assistant may automatically close the conversation. (The documentation is unclear on this, and the exact behavior has changed over time.)
Media Response
You're not clear why using Media Response "shouldn't be used", but this is one of the only ways way to trigger an event when speaking is completed.
There are several downsides, however:
There are a number of bugs with Media Response around quitting
On devices with screens, there is a media player. Since the media itself is incidental to what you're doing, having the player doesn't make sense
It isn't supported on all surfaces
Interactive Canvas
A similar approach, however, would be to use the Interactive Canvas. This gives you an HTML page with JavaScript that you control, including being able to generate responses to the server as if the user spoke them (or as if they touched a suggestion chip). You can also listen to events for when the generated speech has finished.
There are, however, a number of downsides which probably prevent you from using this right now:
The biggest is that the Interactive Canvas can only be used for games right now. (But this seems to be a policy decision, rather than a technical one. So perhaps it will be lifted in the future.)
It does not work on smart speakers - only some devices with screens.
Combining the above approaches
One way to get around the device limitations of the Interactive Canvas and the poor visuals that accompany Media Response might be to mix the two. For devices that support IC, use that. If not, try using Media Response. (You may even wish to consider the no-input reprompt for some platforms.)
But this still won't work on all devices, and still has the limitation that Interactive Canvas is only for games right now.
Summary
There is no one, clear, way to handle this... and this isn't a feature they are likely to add given the conversational nature of the platform. However, there may be some workarounds which might work for your scenario.
If i have an intent like
'How much have I spent on'
that I'd like to match basically 'how much have i spent on ___________________'
where ____ could be any word or phrase.
I've been doing this sort of thing for some intents that do some fuzzy matching to determine what the user is speaking of and it works ok, but is it possible to do this in a reliable way, without requiring that a particularly specific phrase was uttered (which defeats the purpose of NLU to a degree)?
I have been looking for a keyword and assuming their "topic" is the remainder of the phrase, and it works, but seems like it will be prone to problems when the actual user doesn't say more or less what I intended.
I imagine I can reorganize to do this with a follow up intent, like "what category?" and then treat the entire response as what I was trying to parse out, I was just looking to avoid it if there was some sort of built in support for this concept.
Thanks!
I think you are on the right path.
You can use #sys.any entity to capture any word or phrase. And according to your use-case and what the intent is, you can add few variations of the sentences how much have i spent on #sys.any in the utterances.
You can also make use of slot-filling or some other fallback mechanism to validate the user input.
I am trying to make a search algorithm with dialogflow that could take any combination of: first name, address, phone number, zip code or city as input to a search algorithm. The user does not need all of them, but we will refine our search with each additional answer until we only have one result. Basically we are trying to identify which customer we are talking to.
How should this type of intent (or set of intents) be structured? We have tried one intent with multiple parameters, but we do not need all of them to be required. We have also written a JavaScript function for fulfillment but how can we communicate back to dialogflow as to whether we need more information?
Thank you very much for your help.
Slot filling is designed for this purpose.
Hope that helps.
Please post more code/details to help answers be more specific.
First, keep in mind that Intents reflect what the user is saying, and not typically what you're replying with or what other information you need. Slot filling sometimes bends this rule, but only if you have required slots.
Since you don't - you need a different approach.
This can be done with a single intent, although you may find that multiple intents make it easier in some ways. The approach is broadly the same:
When you ask the question, make sure you set an Outgoing Context with a relatively short lifespan (2-3 is good) to indicate you are collecting user info.
Create an Intent (or Intents) that have sample phrases that capture the information you need.
Some of these will have obvious entity types (phone number and zip code) while others will be more difficult (First name has a system entity type, but it doesn't include all possible first names).
You will need to create sample phrases that collect the parameters by themselves, along with phrases that make sense. You're the best judge of this, and you should probably write some sample conversations before you write the phrases.
In your fulfillment, you'll figure out if you have enough information.
If you do, you can reply and clear the Context that was set. (Clearing it is important so Dialogflow doesn't match the information collecting Intent again.)
If you do not, you can add the information you have as parameters to the Context so you can save it for later processing, make sure you reset the Context lifespan (so it doesn't expire), and prompt the user for additional information. Again, having a conversation mocked out ahead of time will help here.
I am making a chat bot to answer questions on a particular subject(example, physics). How would you structure all the possible questions as intent in dialogflow?
I am considering the following 2 methods,
Methods:
make each question as an unique intent.
group all the questions into one "asking questions" intent and use entity to identify the specific question being asked.
Pros:
Dialogflow can easily match users input to the specific questions using low confidence score threshold, and can give multiple training phrases per question.
Only need one "asking questions" intent, neater and maintaining it is easier.
Cons:
There will be tons of intents, and maintaining it might be a nightmare. Might also reach the max number of intents.
Detecting entity might be more strict and less robust.
I would suggest you to try Knowledge Base feature of DialogFlow.
You can give multiple web-page links from where it can gather all the questions, or you can manually prepare a list and upload it to DialogFlow.
That way you don't need to make it in separate intents, it will try to match it automatically.
Let me know if you have any confusion.
This looks like an FAQ type chatbot. You can develop the chatbot in 2 ways:
Use Prebuilt Agents - Go to prebuilt agent and select and import FAQ and add your intents.
Use Knowledge Base approach - This is in Beta mode right now, but super easy to build.
a. You need to enable Beta Features from the agent settings
b. Go to Knowledge Base on the left menu, create a new document and upload CSV file (Q and A). You can also provide a link for Q/A if you have.
Check out the documentation for more details.
Knowledge Base seems to be the best way, but it only supports English content