LUIS: Identify "normalized" value based on synonyms automatically - nlp

I am currently developing a chatbot that recommends theater plays to the user.
It is working pretty well, but now I want to enable the user to get recommendations based on the type of theater plays (like funny, dramatic, sad).
Nevertheless, as I do not know how exactly the user is phrasing the request also synonyms might be used (funny: witty, humourous, ...)
What is a good solution to get these types from the user's request in a normalized way?
Typically I would use the List entity, but then I have to insert all synonyms for each possible value by myself. Is there a way how i can define my "normalized" values and synonyms are automatically matched by LUIS (and improved by further training of the model)

Related

Searching for known phrases in text using Azure Cognitive Services

I'm trying to ascertain the "right tool for the job" here, and I believe Cognitive Services can do this but without disappearing down an R&D rabbit-hole I thought I'd make sure I was tunnelling in the right direction first.
So, here is the brief:
I have a collection of known existing phrases which I want to look for, but these might be written in slightly different ways, be that grammar or language.
I want to be able to parse a (potentially large) volume of text to scan and look for those phrases so that I can identify them.
For example, my phrase could be "the event will be in person" but that also needs to identify different uses of language; for example "in-person event", "face to face event", or "on-site event" - as well as the various synonyms and variations you can get with such things.
LUIS initially appeared to be the go-to tool for this kind of thing, and includes the ability to write your own Features (aka Phrase Lists) to augment the model, but it isn't clear whether that would hit the brief - LUIS appears to be much more about "intent" and user interaction (for example building a chat Bot, or understanding intent from emails).
Text Analytics also seems a likely candidate, but again seems more focused about identifying "entities" (such as people / places / organisations) rather than a natural language "phrase" - would this tool work if I was defining my own "Topics" or is that really just barking up the wrong tree?
.. or ... is there actually something else I should be looking at completely different?
At this point - I'm really looking for a "which tool should I spend lots of time learning about".
Thanks all in advance - I appreciate this is a fairly open-ended requirement.
It seems your scenario aligns more with our text analytics service. I was going to recommend Key Phrase Extraction API which evaluates unstructured text and returns a list of key phrases. However, since you require to use known (custom) phrase list, it may not be the solution you're looking for. We currently don't support custom key phrase extraction today, however it's on our roadmap. If interested, we can connect you with the product team to learn more about your scenario.
Updated:
Please try custom NER capability.

Can chatbots learn or unlearn while chatting with trusted users

Can chatbots like [Rasa] learn from the trusted user - new additional employees, product ids, product categories or properties - or unlearn when these entities are no longer current ?
Or do I have to go through formal data collection, training sessions, testing (confidence rates > given ratio), before the new version be made operational.
If you have entity values that are being checked against a shifting list of valid values, it's more scalable to check those values against a database that is always up to date (e.g. your backend systems probably have a queryable list of current employees). Then if a user provides a value that used to be valid and now isn't, it will act the same as if a user provided an invalid value in the first place.
This way, the entity extraction can stay the same regardless of if some training examples go out of relevance -- though of course it's always good to try to keep your data up to date!
Many Chatbots do not have such a function. Except avanced ones like Alexa, with the keyword "Remember" available 2017 +/-. The user wants Alexa to commit to memory certain facts.
IMHO such a feature is a mark of "intelligence". It is not trivial to implement in ML systems where coefficients in their neural network models are updated by back-propagation after passing learning examples. Rule-based systems (such as CHAT80 a QA system on geography) store their knowledge in relations that can be updated more transparently.

Is it possible to use DialogFlow simply to parse text?

Is it possible to use DialogFlow to simply parse some text and return the entities within that text?
I'm not interested in a conversation or bot-like behaviour, simply text in and list of entities out.
The entity recognition seems to be better with DialogFlow than Google Natural Language Processing and the ability to train might be useful also.
Cheers.
I've never considered this... but yeah, it should be possible. You would upload the entities with synonyms. Then, remove the "Default Fallback Intent", and make a new intent, called "catchall". Procedurally generate sentences with examples of every entity being mentioned, alone or in combination (in whatever way you expect to need to extract them). In "Settings", change the "ML Settings" so the "ML Classification Threshold" is 0.
In theory, it should now classify every input as "catchall", and return all the entities it finds...
If you play around with tagging things as sys.any, this could be pretty effective...
However, you may want to look into something that is built for this. I have made cool stuff with Aylien's NLP API. They have entity extraction, and the free tier gives you 1,000 hits per day.
EDIT: If you can run some code, instead of relying on SaaS, you could check out Rasa NLU for Entity Extraction. With a SpaCy backend it would do well on recognizing pre-trained entities, and with a different backend you can use custom entities.

Entity over-generalisation on Api.ai

We’ve been having a great deal of difficulty with chatbot entities over-generalising on Api.ai, i.e. returning values that have not been specified for that entity when using the “Define Synonyms” feature on custom entities, even when the “Allow automated expansion” flag is turned off.
Our key example is an entity we use for confirming a user choice called confirm_accept. We had an entry: “that’s it”, with synonyms: “thats it”, “that is it”, “that’s it thanks”, “thats it thanks”, “that is it thanks”. This entity value was being returned unexpectedly in expressions where just a stray “it” was appearing.
In general, we have seen a lot of inappropriate entity generalisation which seems to indicate there is some form of stop word removal and stemming/lemmatization going on during entity identification... and which can’t be turned off.
This returns poor entity classifications, making it difficult to create entities for which very precise values are important, e.g. where a single word or character can make a big difference in meaning. Our key use case involves a lot of address processing, so it is important we get back only values we have specified.
Types of over-generalisations we’ve seen include:
inappropriate identification of determiners (a, an, the, this, that, etc.) as part of entities: as in “it” returning “that’s it”
stemmed words: as in stray mentions of “driving”, returning “drive” (a valid street type entity)
inappropriate plural stems: a stray mention of “children” returning “child”, or a stray “will” returning “wills” (which in our case “child” and “wills” are street name entities, so we don’t want “children” or “will” to be returned)
This is currently making it difficult to create a production quality chatbot using the Api.ai service.
Anyone had more luck at either getting a response from Api.ai or solving the over-generalisation problem?
Entities are meant to extract information from conversation:
API.AI's entities are meant to be used to extract data from conversational input not parse different phrases and parts of speech. For your examples (that’s it, thats it, that is it, that’s it thanks, thats it thanks, that is it thanks) all seem to indicate that the user's intent is to indicate that the last message from the API.AI agent was correct. For instances like these, it would be best to use these phrases as examples for an intent or an existing intent with other responses indicating that the user wants to indicate that the last response was correct.
API.AI captures entity tenses and plurals automatically: To address your other concern (driving entity, returning drive value, children returning child, or wills returning wills): API.AI intentionally captures different tenses and plurals of entities to provide a better experience for users who many not know the exact entities you've entered in your database. This allows users of your conversational app to have a natural conversation with your users and not require precise wording or

suggest list of how-to articles based on text content

I have 20,000 messages (combination of email and live chat) between my customer and my support staff. I also have a knowledge base for my product.
Often times, the questions customers ask are quite simple and my support staff simply point them to the right knowledge base article.
What I would like to do, in order to save my support staff time, is to show my staff a list of articles that may likely be relevant based on the initial user's support request. This way they can just copy and paste the link to the help article instead of loading up the knowledge base and searching for the article manually.
I'm wondering what solutions I should investigate.
My current line of thinking is to run analysis on existing data and use a text classification approach:
For each message, see if there is a response with a link to a how-to article
If Yes, extract key phrases (microsoft cognitive services)
TF-IDF?
Treat each how-to as a 'classification' that belongs to sets of key phrases
Use some supervised machine learning, support vector machines maybe to predict which 'classification, aka how-to article' belongs to key phrase determined from a new support ticket.
Feed new responses back into the set to make the system smarter.
Not sure if I'm over complicating things. Any advice on how this is done would be appreciated.
PS: naive approach of just dumping 'key phrases' into search query of our knowledge base yielded poor results since the content of the help article is often different than how a person phrases their question in an email or live chat.
A simple classifier along the lines of a "spam" classifier might work, except that each FAQ would be a feature as opposed to a single feature classifier of spam, not-spam.
Most spam-classifiers start-off with a dictionary of words/phrases. You already have a start on this with your naive approach. However, unlike your approach a spam classifier does much more than a text search. Essentially, in a spam classifier, each word in the customer's email is given a weight and the sum of weights indicates if the message is spam or not-spam. Now, extend this to as many features as FAQs. That is, features like: FAQ1 or not-FAQ1, FAQ2 or not-FAQ2, etc.
Since your support people can easily identify which of the FAQs an e-mail requires then using a supervised learning algorithm would be appropriate. To reduce the impact of any miss-classification errors, then consider the application presenting a support person with the customer's email followed by the computer generated response and all the support person would have to-do is approve the response or modify it. Modifying a response should result in a new entry in the training set.
Support Vector Machines are one method to implement machine learning. However, you are probably suggesting this solution way too early in the process of first identifying the problem and then getting a simple method to work, as well as possible, before using more sophisticated methods. After all, if a multi-feature spam classifier works why invest more time and money in something else that also works?
Finally, depending on your system this is something I would like to work-on.

Resources