Azure Recommendations API's Parameter - azure

I would like to make a recommendation model using Recommendations API on Azure MS Cognitive Services. I can't understand three API's parameters below for "Create/Trigger a build." What do these parameters mean?
https://westus.dev.cognitive.microsoft.com/docs/services/Recommendations.V4.0/operations/56f30d77eda5650db055a3d0
EnableModelingInsights Allows you to compute metrics on the
recommendation model. Valid Values: True/False
AllowColdItemPlacement Indicates if the recommendation should also
push cold items via feature similarity. Valid Values: True/False
ReasoningFeatureList Comma-separated list of feature names to be
used for reasoning sentences (e.g. recommendation explanations).
Valid Values: Feature names, up to 512 chars
Thank you!

That page is missing references to content mentioned at other locations. See this page for a more complete guide...
https://azure.microsoft.com/en-us/documentation/articles/machine-learning-recommendation-api-documentation/
It describes Cold Items in the Rank Build section in the document as...
Features can enhance the recommendation model, but to do so requires the use of meaningful features. For this purpose a new build was introduced - a rank build. This build will rank the usefulness of features. A meaningful feature is a feature with a rank score of 2 and up. After understanding which of the features are meaningful, trigger a recommendation build with the list (or sublist) of meaningful features. It is possible to use these feature for the enhancement of both warm items and cold items. In order to use them for warm items, the UseFeatureInModel build parameter should be set up. In order to use features for cold items, the AllowColdItemPlacement build parameter should be enabled. Note: It is not possible to enable AllowColdItemPlacement without enabling UseFeatureInModel.
It also describes the ReasoningFeatureList in the Recommendation Reasoning section as...
Recommendation reasoning is another aspect of feature usage. Indeed, the Azure Machine Learning Recommendations engine can use features to provide recommendation explanations (a.k.a. reasoning), leading to more confidence in the recommended item from the recommendation consumer. To enable reasoning, the AllowFeatureCorrelation and ReasoningFeatureList parameters should be setup prior to requesting a recommendation build.

Related

Why do I have to select number of features in SBS scikit-learn?

I saw the explanation of SBS on multiple sites for example (https://www.analyticsvidhya.com/blog/2016/12/introduction-to-feature-selection-methods-with-an-example-or-how-to-select-the-right-variables/), which states:
In backward elimination, we start with all the features and removes the least significant feature at each iteration which improves the performance of the model. We repeat this until no improvement is observed on removal of features.
I wonder why do I have to choose the number of features to pick in the scikit package? If SBS should stop picking features when model doesn't improve anymore?
Am I missing something?
There is a pull request to that end: https://github.com/scikit-learn/scikit-learn/pull/20145. (See also the linked Issue#20137.)

Can chatbots learn or unlearn while chatting with trusted users

Can chatbots like [Rasa] learn from the trusted user - new additional employees, product ids, product categories or properties - or unlearn when these entities are no longer current ?
Or do I have to go through formal data collection, training sessions, testing (confidence rates > given ratio), before the new version be made operational.
If you have entity values that are being checked against a shifting list of valid values, it's more scalable to check those values against a database that is always up to date (e.g. your backend systems probably have a queryable list of current employees). Then if a user provides a value that used to be valid and now isn't, it will act the same as if a user provided an invalid value in the first place.
This way, the entity extraction can stay the same regardless of if some training examples go out of relevance -- though of course it's always good to try to keep your data up to date!
Many Chatbots do not have such a function. Except avanced ones like Alexa, with the keyword "Remember" available 2017 +/-. The user wants Alexa to commit to memory certain facts.
IMHO such a feature is a mark of "intelligence". It is not trivial to implement in ML systems where coefficients in their neural network models are updated by back-propagation after passing learning examples. Rule-based systems (such as CHAT80 a QA system on geography) store their knowledge in relations that can be updated more transparently.

Virtual Assistant -> LUIS, QnA, Dispatcher best practice

I have some question about some "best practice" for certain issues that we are facing using LUIS, QnA Maker, in particular for the Dispatcher:
1) Is there any best practice in case we have more that 15k utterances in the Dispatcher? That's looks like a limitation of the LUIS apps but the scalability of the model in the long run will be questionable.
2) Bing Spell Check for LUIS changes names and surnames for example, how to avoid this? I guess that Bing Spell Check is necessary when we are talking about ChatBots, since the typo are always behind the door, but using it for names is dangerous.
3) Cross validation is not supported out of the box, you would have split your data to folds with custom code (not difficult), use the command line to train and publish your model on your k-1/k folds, then send the k-fold utterances to the API one-by-one. Batch upload is only supported through the UI https://cognitive.uservoice.com/forums/551524-language-understanding-luis/suggestions/20082157-add-api-to-batch-test-model and is limited to a test set of 1,000 utterances. If we use the one-by-one approach, we pay $1,50 per 1k transactions https://azure.microsoft.com/de-de/pricing/details/cognitive-services/language-understanding-intelligent-services/ and this means to get cross-validation metrics for the 5 folds for example, we could be paying about 20$ for a single experiment with our current data, more if we add more data.
4) Model is a black box, which doesn't give us the ability to use custom features if needed.
I will try to address your concerns in the best possible way I can as follows:
1) As per the LUIS documentation,
Hence, you cannot exceed the limit. In case of Dispatch apps,if the total utterance exceeds 15k, then dispatch will down sample the utterances to keep it under 15k. There is an optional parameter(--doAutoActiveLearning) for CLI to do auto active learning which will down sample intelligently (remove non relevant utterances).
--doAutoActiveLearning: (optional) Default to false. LUIS limit on training-set size is 15000. When a LUIS app has much more utterances for training, Dispatch's auto active learning process can intelligently down sample the utterances.
2) Bing Spell Check helps users to correct misspelled words in utterances before LUIS predicts the score and entities of the utterance. However, if you want to avoid using Bing Spell Check API service, then you will need to add the correct and incorrect spelling which can be done in two ways:
Label example utterances that have the all the different spellings so that LUIS can learn proper spelling as well as typos. This option requires more labeling effort than using a spell checker.
Create a phrase list with all variations of the word. With this solution, you do not need to label the word variations in the example utterances.
3) As per the current documentation, a maximum of 1000 utterances are allowed per test. The data set is a JSON-formatted file containing a maximum of 1,000 labeled non-duplicate utterances. You can test up to 10 data sets in an app. If you need to test more, delete a data set and then add a new one. I would suggest you to report it as a feature request in the feedback forum.
Hope this helps.

What is the role of feature type in AzureML?

I want to know what is the difference between feature numeric and numeric columns in Azure Machine Learning Studio.
The documentation site states:
Because all columns are initially treated as features, for modules
that perform mathematical operations, you might need to use this
option to prevent numeric columns from being treated as variables.
But nothing more. Not what a feature is, in which modules you need features. Nothing.
I specifically would like to understand if the clear feature dropdown option in the fields in the edit metadata-module has any effect. Can somebody give me a szenario where this clear feature-operation changes the ML outcome? Thank you
According to the documentation in ought to have an effect:
Use the Fields option if you want to change the way that Azure Machine
Learning uses the data in a model.
But what can this effect be? Any example might help
As you suspect, setting a column as feature does have an effect, and it's actually quite important - when training a model, the algorithms will only take into account columns with the feature flag, effectively ignoring the others.
For example, if you have a dataset with columns Feature1, Feature2, and Label and you want to try out just Feature1, you would apply clear feature to the Feature2 column (while making sure that Feature1 has the feature label set, of course).

Can we compare two different strings for similarity using Google NLP API's?

Example:
String 1: Help me to track calls.
String 2: Assist me in call tracking.
These two strings have the same meaning but are not identical. Is there any way to find similarity between strings like these using Google Natural Language Processing Api's.
The Google Cloud Natural Language API doesn't provide a specific feature to find similarities between two different strings; instead, this service offers the Content Classification functionality that you can use to classify the strings into categories to then calculate the similarity between them based on their resulting content classification. You can find a helpful Content Classification Tutorial where is explained the process required to perform these tasks.
In case this feature doesn't cover your current needs, you can use the Send Feedback button, located at the lower left and upper right corners of the service public documentation, as well as take a look the Issue Tracker tool in order to raise a Natural Language API feature request and notify to Google about this desired functionality.

Resources