Virtual Assistant -> LUIS, QnA, Dispatcher best practice - azure

I have some question about some "best practice" for certain issues that we are facing using LUIS, QnA Maker, in particular for the Dispatcher:
1) Is there any best practice in case we have more that 15k utterances in the Dispatcher? That's looks like a limitation of the LUIS apps but the scalability of the model in the long run will be questionable.
2) Bing Spell Check for LUIS changes names and surnames for example, how to avoid this? I guess that Bing Spell Check is necessary when we are talking about ChatBots, since the typo are always behind the door, but using it for names is dangerous.
3) Cross validation is not supported out of the box, you would have split your data to folds with custom code (not difficult), use the command line to train and publish your model on your k-1/k folds, then send the k-fold utterances to the API one-by-one. Batch upload is only supported through the UI https://cognitive.uservoice.com/forums/551524-language-understanding-luis/suggestions/20082157-add-api-to-batch-test-model and is limited to a test set of 1,000 utterances. If we use the one-by-one approach, we pay $1,50 per 1k transactions https://azure.microsoft.com/de-de/pricing/details/cognitive-services/language-understanding-intelligent-services/ and this means to get cross-validation metrics for the 5 folds for example, we could be paying about 20$ for a single experiment with our current data, more if we add more data.
4) Model is a black box, which doesn't give us the ability to use custom features if needed.

I will try to address your concerns in the best possible way I can as follows:
1) As per the LUIS documentation,
Hence, you cannot exceed the limit. In case of Dispatch apps,if the total utterance exceeds 15k, then dispatch will down sample the utterances to keep it under 15k. There is an optional parameter(--doAutoActiveLearning) for CLI to do auto active learning which will down sample intelligently (remove non relevant utterances).
--doAutoActiveLearning: (optional) Default to false. LUIS limit on training-set size is 15000. When a LUIS app has much more utterances for training, Dispatch's auto active learning process can intelligently down sample the utterances.
2) Bing Spell Check helps users to correct misspelled words in utterances before LUIS predicts the score and entities of the utterance. However, if you want to avoid using Bing Spell Check API service, then you will need to add the correct and incorrect spelling which can be done in two ways:
Label example utterances that have the all the different spellings so that LUIS can learn proper spelling as well as typos. This option requires more labeling effort than using a spell checker.
Create a phrase list with all variations of the word. With this solution, you do not need to label the word variations in the example utterances.
3) As per the current documentation, a maximum of 1000 utterances are allowed per test. The data set is a JSON-formatted file containing a maximum of 1,000 labeled non-duplicate utterances. You can test up to 10 data sets in an app. If you need to test more, delete a data set and then add a new one. I would suggest you to report it as a feature request in the feedback forum.
Hope this helps.

Related

Cognitive Service show Fill words and hide personal data

We use the Azure Batch Transcription Service to get the Transcript of an Audio / Speech.
In here we noticed, that sometimes filler words like "uhm", "hm" or something similar are included, but very rarely - also as we used this service for a few months already and we have the feeling as if it "got less" (so less "uhm"s in the transcript)
Q1: Is there a way to get the fill words? We want to recieve them within the transcript.
Also, as we sometimes record conversations it can happen that someone says a name or is talking about other personal information.
Q2: Is there a way to "filter" those personal information / words within the transcript?
Sorry, I don't think there is a way to filter personal data/ word when translate. We only can do profanity Filter for batch transcription.
But I agree this feature will be very helpful. I will forward this feature request to product group to see if we can have this in the feature.
Thing I will suggest is to optimize the transcription as last to filter the sensitive information.
Regards,
Yutong

Can chatbots learn or unlearn while chatting with trusted users

Can chatbots like [Rasa] learn from the trusted user - new additional employees, product ids, product categories or properties - or unlearn when these entities are no longer current ?
Or do I have to go through formal data collection, training sessions, testing (confidence rates > given ratio), before the new version be made operational.
If you have entity values that are being checked against a shifting list of valid values, it's more scalable to check those values against a database that is always up to date (e.g. your backend systems probably have a queryable list of current employees). Then if a user provides a value that used to be valid and now isn't, it will act the same as if a user provided an invalid value in the first place.
This way, the entity extraction can stay the same regardless of if some training examples go out of relevance -- though of course it's always good to try to keep your data up to date!
Many Chatbots do not have such a function. Except avanced ones like Alexa, with the keyword "Remember" available 2017 +/-. The user wants Alexa to commit to memory certain facts.
IMHO such a feature is a mark of "intelligence". It is not trivial to implement in ML systems where coefficients in their neural network models are updated by back-propagation after passing learning examples. Rule-based systems (such as CHAT80 a QA system on geography) store their knowledge in relations that can be updated more transparently.

Is it possible to use DialogFlow simply to parse text?

Is it possible to use DialogFlow to simply parse some text and return the entities within that text?
I'm not interested in a conversation or bot-like behaviour, simply text in and list of entities out.
The entity recognition seems to be better with DialogFlow than Google Natural Language Processing and the ability to train might be useful also.
Cheers.
I've never considered this... but yeah, it should be possible. You would upload the entities with synonyms. Then, remove the "Default Fallback Intent", and make a new intent, called "catchall". Procedurally generate sentences with examples of every entity being mentioned, alone or in combination (in whatever way you expect to need to extract them). In "Settings", change the "ML Settings" so the "ML Classification Threshold" is 0.
In theory, it should now classify every input as "catchall", and return all the entities it finds...
If you play around with tagging things as sys.any, this could be pretty effective...
However, you may want to look into something that is built for this. I have made cool stuff with Aylien's NLP API. They have entity extraction, and the free tier gives you 1,000 hits per day.
EDIT: If you can run some code, instead of relying on SaaS, you could check out Rasa NLU for Entity Extraction. With a SpaCy backend it would do well on recognizing pre-trained entities, and with a different backend you can use custom entities.

How to address Nonsense queries to LUIS?

I know that i can make a none intent to cover some of these, however we cannot just create every nonsense question a person could ask.
Or even if someone types in a 50 word statement. The bigger problem is that if we get a query to LUIS, it is assigning it an intent that is not correct, without even having identified any entities either.
What to do?
To handle these cases, it would be better to add more labeled utterances to your other intents and occasionally add the stray utterances to the None intent. When the model is better for predicting your non-None intents, the better predicting of None intents also accompany this (LUIS attempts to match to an intent rather than cutting intents out).
If intents are triggering without any entities being recognized (and thus you believe the wrong intent has been triggered), this should be handled at an application level, where you would then disambiguate the intents back to your users. If you've set the verbose flag to true, then you could take the top three scoring intents and present those back as options to your user. Then you can move back into the proper dialog.
After you've moved into the intent/dialog they meant to access, you can conduct a programmatic API call to add that utterance to the intent. Individually adding labeled utterances can be problematic (the programmatic API key has a limit of 100,000 transactions per month, and a rate of 10 transactions per second), so you can instead aggregate the utterances and conduct batch labeling. An additional bit of info; there is a limit of 100 labeled utterance per batch upload.
Adding to the Steven's answer - in the intent window, you have the Suggested Utternaces tab - this is also a hint for the algorithm, kind of reinforced learning approach.

suggest list of how-to articles based on text content

I have 20,000 messages (combination of email and live chat) between my customer and my support staff. I also have a knowledge base for my product.
Often times, the questions customers ask are quite simple and my support staff simply point them to the right knowledge base article.
What I would like to do, in order to save my support staff time, is to show my staff a list of articles that may likely be relevant based on the initial user's support request. This way they can just copy and paste the link to the help article instead of loading up the knowledge base and searching for the article manually.
I'm wondering what solutions I should investigate.
My current line of thinking is to run analysis on existing data and use a text classification approach:
For each message, see if there is a response with a link to a how-to article
If Yes, extract key phrases (microsoft cognitive services)
TF-IDF?
Treat each how-to as a 'classification' that belongs to sets of key phrases
Use some supervised machine learning, support vector machines maybe to predict which 'classification, aka how-to article' belongs to key phrase determined from a new support ticket.
Feed new responses back into the set to make the system smarter.
Not sure if I'm over complicating things. Any advice on how this is done would be appreciated.
PS: naive approach of just dumping 'key phrases' into search query of our knowledge base yielded poor results since the content of the help article is often different than how a person phrases their question in an email or live chat.
A simple classifier along the lines of a "spam" classifier might work, except that each FAQ would be a feature as opposed to a single feature classifier of spam, not-spam.
Most spam-classifiers start-off with a dictionary of words/phrases. You already have a start on this with your naive approach. However, unlike your approach a spam classifier does much more than a text search. Essentially, in a spam classifier, each word in the customer's email is given a weight and the sum of weights indicates if the message is spam or not-spam. Now, extend this to as many features as FAQs. That is, features like: FAQ1 or not-FAQ1, FAQ2 or not-FAQ2, etc.
Since your support people can easily identify which of the FAQs an e-mail requires then using a supervised learning algorithm would be appropriate. To reduce the impact of any miss-classification errors, then consider the application presenting a support person with the customer's email followed by the computer generated response and all the support person would have to-do is approve the response or modify it. Modifying a response should result in a new entry in the training set.
Support Vector Machines are one method to implement machine learning. However, you are probably suggesting this solution way too early in the process of first identifying the problem and then getting a simple method to work, as well as possible, before using more sophisticated methods. After all, if a multi-feature spam classifier works why invest more time and money in something else that also works?
Finally, depending on your system this is something I would like to work-on.

Resources