LUIS List entity - azure

I am using "list" entity. However, I do not achieve my expected result.
Here is what I have for LUIS intent:
getAnimal
I want to get a cat**[animal]**.
Here is what I have with LUIS entities:
List Entities [animal]
cat: russian blue, persian cat, british shorthair
dog: bulldog, german shepard, beagle
rabbit: holland lop, american fuzzy lop, florida white
Here is what I have with LUIS Phrase lists:
Phrase lists [animal_phrase]
cat, russian blue, persian cat, british shorthair, dog, bulldog, german shepard, beagle, etc
Desired:
When user enters "I want to get a beagle." It will be match with "getAnimal" intent.
Actual:
When user enters "I want to get a beagle." It will be match with "None" intent.
Please help. Your help will be appreciated.

So using a phrase list is a good way to start, however you need to make sure you provide enough data for LUIS to be able to learn the intents as well as the entities separate from the phrase list. Most likely you need to add more utterances.
Additionally, if your end goal is to have LUIS recognize the getAnimal intent, I would do away with the list entity, and instead use a simple entity to take advantage of LUIS's machine learning, and do so in combination with a phrase list to boost the signal to what an animal may look like.
As the documentation on phrase lists states,
Features help LUIS recognize both intents and entities, but features
are not intents or entities themselves. Instead, features might
provide examples of related terms.
--Features, in machine learning, being a distinguishing trait or attribute of data that your system observes, and what you add to a group/class when using a phrase list
Start by
1. Creating a simple entity called Animal
2. Add more utterances to your getAnimal intent.
Following best practices outlined here, you should include at least 15 utterances per intent. Make sure to include plenty of examples of the Animal entity.
3. Be mindful to include variation in your utterances that are valuable to LUIS's learning (different word order, tense, grammatical correctness, length of utterance and entities themselves). Highly recommend reading this StackOverflow answer I wrote on how to build your app properly get accurate entity detection if you want more elaboration.
above blue highlighted words are tokens labeled to the simple Animal entity
3. Use a phrase list.
Be sure to include values that are not just 1 word long, but 2, 3, and 4 words long in length, as different animal names may possibly be that long in length (e.g. cavalier king charles spaniel, irish setter, english springer spaniel, etc.) I also included 40 animal breed names. Don't be shy about adding Related Values suggested to you into your phrase list.
After training your app to update it with your changes, prosper!
Below "I want a beagle" reaches the proper intent. LUIS will even be able to detect animals that were not entered in the app in entity extraction.

Related

Dialogflow Detect Intent Matching Confidence And Entity Value Selection

I am looking to understand how detect intent confidence impacts entity value selection in Dialogflow. For example, using two user generated phrases:
Phrase 1: "For snack yesterday I had an apple and peanut butter". This phrase has an intent detection confidence of '1' and 'snack' and 'yesterday' are tagged correctly to their respective entities, and the foods, 'apple' and 'peanut butter' are correctly matched within their entity [food], with values of 'apple' and 'peanut butter' respectively.
Phrase 2: "Are the snack yesterday I had an apple and peanut butter". This phrase was mumbled by the user or garbled by Siri (we use an iOS voice app). Here the intent detection confidence is '0.852' and while 'snack' and 'yesterday' are tagged to their entities correctly, the foods are not treated as above. Specifically, while both are tagged to the correct entity [food] and 'apple' was correctly tagged to 'apple', the 'peanut' of 'peanut butter' was tagged as one food [value = 'peanut'] and the 'butter' of 'peanut butter' was tagged as another food [value = 'butter'].
As context we have ~500 intents, the intent matched above has ~400 training phrases (clearly not including 'Are the...') and ~200 entities, the largest of which has 29,998 values.
So it appears the intent detection confidence impacts the entity parameter value matching. Can anyone shed any light on this? From our viewpoint, it is not a useful 'feature'. Quite the opposite.
When searching for a matching intent, Dialogflow scores potential matches with an intent detection confidence, also known as the confidence score.
These values range from 0.0 (completely uncertain) to 1.0 (completely certain). Without taking the other factors described in this document into account, once intents are scored, there are three possible outcomes:
If the highest scoring intent has a confidence score greater than or equal to the ML Classification Threshold setting, it is returned as a match.
If no intent meets the threshold, a fallback intent is matched.
If no intents meet the threshold and no fallback intent is defined, no intent is matched.
Update: As per the GCP DialogFlow development team:
Scores are referred to intent recognition confidence
Parameter extraction is not taken into account when scores are computed"
In other words, there is no relationship between intent classification confidence and entity extraction.
The described behavior could potentially be a bug within DialogFlow or something specific to your GCP project and further inspection for your GCP project is required with GCP Support to investigate why this is happening. You can create a GCP Support Case.
Over the last few weeks we have interacted at some length with GCP Support who've interacted with the DF engineering team.
The short answer is they indicate the entity value extraction is 'as designed', and specifically, when using a composite entity (a nested entity structure), values that match input terms (here, 'peanut', 'butter' and 'peanut butter') are extracted and matched at random. So a user's utterance of 'peanut butter' may be matched to 'peanut' and 'butter' or 'peanut butter' at random.
This behavior cannot be controlled by adding additional training phrases to the agent.
From our point of view, the behavior is not as desired but we understand the implications for our design. Hopefully this 'answer' helps others navigate this and similar issues.

how to validate user expression in dialogflow

I have created a pizza bot in dialogflow. The scenario is like..
Bot says: Hi What do you want.
User says : I want pizza.
If the user says I want watermelon or I love pizza then dialogflow should respond with error message and ask the same question again. After getting a valid response from the user the bot should prompt the second like
Bot says: What kind of pizza do you want.
User says: I want mushroom(any) pizza.
If the user gives some garbage data like I want icecream or I want good pizza then again bot has to respond with an error and should ask the same question. I have trained the bot with the intents but the problem is validating the user input.
How can I make it possible in dialogflow?
A glimpse of training data & output
If you have already created different training phrases, then invalid phrases will typically trigger the Fallback Intent. If you're just using #sys.any as a parameter type, then it will fill it with anything, so you should define more narrow Entity Types.
In the example Intent you provided, you have a number of training phrases, but Dialogflow uses these training phrases as guidance, not as absolute strings that must be matched. From what you've trained it, it appears that phrases such as "I want .+ pizza" should be matched, so the NLU model might read it that way.
To narrow exactly what you're looking for, you might wish to create an Entity Type to handle pizza flavors. This will help narrow how the NLU model will interpret what the user will say. It also makes it easier for you to understand what type of pizza they're asking for, since you can examine just the parameters, and not have to parse the entire string again.
How you handle this in the Fallback Intent depends on how the rest of your system works. The most straightforward is to use your Fulfillment webhook to determine what state of your questioning you're in and either repeat the question or provide additional guidance.
Remember, also, that the conversation could go something like this:
Bot says: Hi What do you want.
User says : I want a mushroom pizza.
They've skipped over one of your questions (which wasn't necessary in this case). This is normal for a conversational UI, so you need to be prepared for it.
The type of pizzas (eg mushroom, chicken etc) should be a custom entity.
Then at your intent you should define the training phrases as you have but make sure that the entity is marked and that you also add a template for the user's response:
There are 3 main things you need to note here:
The entities are marked
A template is used. To create a template click on the quote symbol in the training phrases as the image below shows. Make sure that again your entity is used here
Make your pizza type a required parameter. That way it won't advance to the next question unless a valid answer is provided.
One final advice is to put some more effort in designing the interaction and the responses. Greeting your users with "what do you want" isn't the best experience. Also, with your approach you're trying to force them into one specific path but this is not how a conversational app should be. You can find more about this here.
A better experience would be to greet the users, explain what they can do with your app and let them know about their options. Example:
- Hi, welcome to the Pizza App! I'm here to help you find the perfect pizza for you [note: here you need to add any other actions your bot can perform, like track an order for instance]! Our most popular pizzas are mushroom, chicken and margarita? Do you know what you want already or do you need help?

Map to the wrong LUIS intent

I am facing an issue whereby words that does not match with any intents, it will assume it belongs to intent with the most labeled utterances.
Example: if
Intent A consists of utterances such as Animals
Intent B consists of utterances such as Fruits
Intent C consists of utterances such as Insects
Intent D consists of utterances such as People Name
Desired: If the random word(s) does not fit into any of the luis intent, it will fit into none luis intent. Example of desired: If word such as "emotions" or "clothes" were entered, it will match as "None" intent.
Actual: When user type random word(s), it match with luis intent with highest number of labeled utterances. If word such as "emotions" was entered, it will match as "A" intent as intent A consist of highest number of labeled utterances.
Please advise on the issue.
Set a score threshold, below which your app won't show any response to the user (or could show a "sorry I didn't get you" message instead). This avoid responding to users with anything LUIS is unsure about, which usually takes care of a lot of "off topic" input too.
I would suggest setting it your threshold between 0.3 and 0.7, depending on the seriousness of your subject matter. This is not a configuration option in LUIS, rather in your code you just do:
if(result.score >=0.5) {
// show response based on intent.
} else {
// ask user to rephrase
}
On a separate note, it looks like your intents are very imbalanced. You want to try and have roughly the same number of utterances for each intent, between 10 and 20 ideally.
So without more details on how you've built your language model, most likely the underlying issue is that you either don't have enough utterances in each intent that have enough variation displaying the different ways in which different utterances could be said for that particular intent.
And by variation I mean different lengths of the utterance (word count), different word order, tenses, grammatical correctness, etc. (docs here)
And remember each intent should have at least 15 utterances.
Also, as stated in best practices, do did you also make sure to include example utterances in your None intent as well? Best practices state that you should have 1 utterances in None for every 10 utterances in the other parts of your app.
Ultimately: build your app so that your intents are distinct enough with varying example utterances built into the intent, so that when you test other utterances LUIS will be more likely able to match to your distinct intents--and if you enter an utterance that doesn't follow any sort of pattern or context of your distinct intents, LUIS will know to detect the utterance to your fallback "None" intent.
If you want more specific help, please post the JSON of your language model.

How to determine if a piece of text mentions a product

I'm new to natural language process so I apologize if my question is unclear. I have read a book or two on the subject and done general research of various libraries to figure out how i should be doing this, but I'm not confident yet that know what to do.
I'm playing with an idea for an application and part of it is trying to find product mentions in unstructured text (e.g. tweets, facebook posts, emails, websites, etc.) in real-time. I wont go into what the products are but it can be assumed that they are known (stored in a file or database). Some examples:
"starting tomorrow, we have 5 boxes of #hersheys snickers available for $5 each - limit 1 pp" (snickers is the product from the hershey company [mentioned as "#hersheys"])
"Big news: 12-oz. bottles of Coke and Pepsi on sale starting Fri." (coca-cola is the product [aliased as "coke"] from coca-cola company and Pepsi is the product from the PepsiCo company)
"#OMG, i just bought my dream car. a mustang!!!!" (mustang is the product from Ford)
So basically, given a piece of text, query the text to see if it mentions a product and receive some indication (boolean or confidence number) that it does mention the product.
Some concerns I have are:
Missing products because of misspellings. I thought maybe i could use a string similarity check to catch these.
Product names that are also English words or things would get caught. Like mustang the horse versus mustang the car
Needing to keep a list of alternative names for products (e.g. "coke" for "coco-cola", etc.)
I don't really know where to start with this but any help would be appreciated. I've already looked at NLTK and SciKit and didn't really gleam how to do this from there. If you know of examples or papers that explain this, links would be helpful. I'm not specific to any language at this point. Java preferably but Python and Scala are acceptable.
The answer that you chose is not really answering your question.
The best approach you can take is using Named Entity Recognizer(NER) and POS tagger (grab NNP/NNPS; Proper nouns). The database there might be missing some new brands like Lyft (Uber's rival) but without developing your own prop database, Stanford tagger will solve half of your immediate needs.
If you have time, I would build the dictionary that has every brands name and simply extract it from tweet strings.
http://www.namedevelopment.com/brand-names.html
If you know how to crawl, it's not a hard problem to solve.
It looks like your goal is to classify linguistic forms in a given text as references to semantic entities (which can be referred to by many different linguistic forms). You describe a number of subtasks which should be done in order to get good results, but they nevertheless are still independent tasks.
Misspellings
In order to deal with potential misspellings of words, you need to associate these possible misspellings to their canonical (i.e. correct) form.
Phonetic similarity: Many reasons for "misspellings" is opacity in the relationship between the word's phonetic form (i.e. how it sounds) and its orthographic form (i.e. how it's spelled). Therefore, a good way to address this is to index terms phonetically so that e.g. innovashun is associated with innovation.
Form similarity: Additionally, you could do a string similarity check, but you may introduce a lot of noise into your results which you would have to address because many distinct words are in fact very similar (e.g. chic vs. chick). You could make this a bit smarter by first morphologically analyzing the word and then using a tree kernel instead.
Hand-made mappings: You can also simply make a list of common misspelling → canonical_form mappings. This would work well for "exceptions" not handled by the above methods.
Word-sense disambiguation
Mustang the car and Mustang the horse are the same form but refer to entirely different entities (or rather classes of entities, if you want to be pedantic). In fact, we ourselves as humans can't tell which one is meant unless we also know the word's context. One widely-used way of modelling this context is distributional lexical semantics: Defining a word's semantic similarity to another as the similarity of their lexical contexts, i.e. the words preceding and succeeding them in text.
Linguistic aliases (synonyms)
As stated above, any given semantic entity can be referred to in a number of different ways: bathroom, washroom, restroom, toilet, water closet, WC, loo, little boys'/girls' room, throne room etc. For simple meanings referring to generic entities like this, they can often be considered to be variant spellings in the same way that "common misspellings" are and can be mapped to a "canonical" form with a list. For ambiguous references such as throne room, other metrics (such as lexical-distributional methods) can also be included in order to disambiguate the meaning, so that you don't relate e.g. I'm in the throne room just now! to The throne room of the Buckingham Palace is beautiful.
Conclusion
You have a lot of work to do in order to get where you want to go, but it's all interesting stuff and there are already good libraries available for doing most of these tasks.

Streets recognition, deduction of severity

I'm trying to make an analysis of a set of phrases, and I don't know exactly how "natural language processing" can help me, or if someone can share his knowledge with me.
The objective is to extract streets and localizations. Often this kind of information is not presented to the reader in a structured way, and It's hard to find a way of parsing it. I have two main objectives.
First the extraction of the streets itself. As far as I know NLP libraries can help me to tokenize a phrase and perform an analysis which will get nouns (for example). But where a street begins and where does it ends?. I assume that I will need to compare that analysis with a streets database, but I don't know wich is the optimal method.
Also, I would like to deduct the level of severity , for example, in car accidents. I'm assuming that the only way is to stablish some heuristic by the present words in the phrase (for example, if deceased word appears + 100). Am I correct?
Thanks a lot as always! :)
The first part of what you want to do ("First the extraction of the streets itself. [...] But where a street begins and where does it end?") is a subfield of NLP called Named Entity Recognition. There are many libraries available which can do this. I like NLTK for Python myself. Depending on your choice I assume that a streetname database would be useful for training the recognizer, but you might be able to get reasonable results with the default corpus. Read the documentation for your NLP library for that.
The second part, recognizing accident severity, can be treated as an independent problem at first. You could take the raw words or their part of speech tags as features, and train a classifier on it (SVM, HMM, KNN, your choice). You would need a fairly large, correctly labelled training set for that; from your description I'm not certain you have that?
"I'm assuming that the only way is to stablish some heuristic by the present words in the phrase " is very vague, and could mean a lot of things. Based on the next sentence it kind of sounds like you think scanning for a predefined list of keywords is the only way to go. In that case, no, see the paragraph above.
Once you have both parts working, you can combine them and count the number of accidents and their severity per street. Using some geocoding library you could even generalize to neighborhoods or cities. Another challenge is the detection of synonyms ("Smith Str" vs "John Smith Street") and homonyms ("Smith Street" in London vs "Smith Street" in Leeds).

Resources