Unlabel automatically flagged entity - azure

When you create Intents and enter their sample utterances in LUIS, the parser will sometimes classify some words as being entities. This is a nice feature when it accurately identifies them, but sometimes it mislabels them.
For example, if you have an entity for statuses of a switch (on/off), constructed as a List with "true" and "false" being the values for which "on" and "off" are synonyms, respectively, then every time you use the words "on" or "off" (which have various meanings, uses and purposes) in an intent's sample utterances, they get labeled as that entity, often inaccurately.
The documentation (https://learn.microsoft.com/en-us/azure/cognitive-services/luis/luis-how-to-add-example-utterances) states that List type entities cannot be removed from utterances. Is there any way to avoid simple words that may be used as synonyms in entities from being matched as entities?
Thanks!

I think the only way to do it is to remove those simple words as synonyms (on, off, etc.) from the List entity synonyms (clicking x next to the synonym). Per the message when you create a List entity, they behave differently than other entity types and are direct matching:
Unlike other entity types, additional values for list entities aren't
discovered during training. This entity type is identified in
utterances by the direct matching of utterance text to the defined
values, rather than learning from context.
You could also use simple entities along with Phrase Lists to help boost the signal to those instances where on/off would be an entity that you would want to capture. Adequately supplied phrase lists to help identify those types of instances would be needed.

Related

Adding additional classes in stanford NLP NER

For stanford NER 3 class model, Location, Person, Organization recognizers are available. Is it possible to add additional classes to this model. For example : Sports as one class to tag sports names.
or if not, is there any model where i can add additional classes.
Note: I didnt exactly mean to add "sports" as a class. I was wondering is there a possibility to add a custom class in that model. If not possible in stanford, is it possible with spacy..
Sports doesn't really fall into the Entity category, as there are a limited number of them, and they are pretty fixed, unlike the names of people or locations, so you can list them all.
I would simply set up a list of sports names, and use string matching to find them.

Training model to recognize brands as entities

I'm trying to create a model in LUIS that allow me to detect if a brand (any brand) is mentioned in an utterance. I've tried different approaches but I'm struggling to get it working.
First I have an intent searchBrand with some examples utterances:
'Help me find info about Channel'
'I want to know more about Adidas'
...
What I want is that LUIS recognizes that a brand has been mentioned in the utterance (as an entity).
I believe I have these options:
Use a List Entity: impossible since I would have to fill the list
with every possible brand that exists and, moreover, the user would
have to write the brand exactly as it is, not allowing typos (e.g. ralf
lauren)
Use a ML Entity: I believe this could be the right approach. I've tried the following without success:
Create a ML Entity "brands"
Add a Structure with 1 component "brand"
Add to the component a Descriptor with a list of different brands as an example
Once I label the entities in the utterances, the model recognizes correctly the brands that I added to the Descriptor but it fails to recognize others brands or typos
Another option is a pattern entity. It fits somewhere between the two options you listed. You do need to train it with the patterns, and if the pattern is off at all it will not recognize the entity (and won't recognize the intent either unless you've separately trained it with utterances, which you should). However, it seems like the phrasings in your case would be consistent enough that you could define a few patterns for this, and as you train your bot from endpoint utterances you can add additional patterns as needed. Here is an example:
As I put this together I realized I'm ignoring [help me] and [find], essentially the pattern is "info about {brand}", which may or may not be appropriate depending on your other intents. If you say something different like "Tell me more about Adidas", the intent will be recognized (I trained it with your sample utterances), but the pattern, and therefore entity, will not.
Tutorial on using Patterns in LUIS
I got it working following this:
Create a ML Entity "brands"
Add to the entity a Descriptor with a list of different brands as an example. Remember to normalize the elements in the Descriptor
Add brands to the Descriptor
Label entities as "brands" inside utterances in intent "searchBrands"
Train & test the model
It is very important to normalize everything in LUIS. I had the brands inside the Descriptor capitalized and LUIS couldn't recognize new ones, once I normalized the brands LUIS started suggesting new ones and recognizing more when testing the model

Where may I find a list of words used to describe relations and relationships?

I'm in a nlp project and there are millions of sentences which contains two entity. I want to find whether two entities have relationships or not in each sentence.
So I want to find a word list like:
['related to','induced by','the treatment of','The effects of','the treatment of','treated with','best for','in response to','approved for','response with','associated with','efficacy of ','in treating','applied to','efficacy in','efficacy and safety','efficacy at','impact on','approved','causing','but none of ','linked to','cause of','associated with','leading to','caused by','the relationship between','responsible for']
I have search github but I can't find it.
What should I do?
As you can see there are a vast number of ways in which a possible semantic relationship between two entities can be lexicalised (i.e. expressed by a word/expression) in language. Furthermore, this will be very dependent on the domain (e.g. politics, healthcare, engineering, astronomy, social sciences, etc, etc, etc). I'm not aware of any "ontology of relations".
By contrast, there will be less variety in the syntactic structures at play (i.e. dependency relations or constituent structure, depending on the syntactic formalism you use). You should be able to identify (many of) these more easily than the actual list of words used (although having a list of word would be very useful). For example, for a given verb, if one entity (noun or noun phrase) is the subject and another entity (noun or noun phrase) the direct object, then that verb is likely to express a relation between the two. The same goes for indirect object, oblique object etc.
You can use a library like spaCy to retrieve the grammatical (dependency) relations between verbs and nominal entities which you can then use to identify semantic relations. For example:
The Moon orbits the Earth.
spaCy dependencies: nsubj(orbits, Moon) obj(orbits, Earth)
semantic relation: orbit(Moon, Earth)
Trump was impeached by Congress.
spaCy dependencies: nsubjpass(impeach, Trump) agent(impeached, by) pobj(by, Congress)
semantic relation: impeach(Congress, Trump)
spaCy also takes care of Named Entity Recognition for you, although it is trained on a specific corpus that may not match your domain. Note that I have used the lemma of the word to represent the relation (not the inflected verb form).
These are just simple examples and the number of configurations will be large and more complex verbal predicates exist (e.g. phrasal verbs), but you can pick up many semantic relations with a few patterns of grammatical dependencies just looking at simple verbs.
This requires a bit of work and I have not provided an implementation, but maybe this will help you make a start...?

Train or Custom Word Entity Types?

I was looking through the documentation and testing Google's Natural Language API and noticed it gets a number of people, events, organizations, and locations incorrect - it appears to be using Wikipedia as a major data source so if it is not in Wikipedia it seems to have trouble identifying the type of various words. Also, if certain words appear in a name (proper noun) it seems to always identify an entity as a certain type which is not always correct.
For instance: "Congress" seems to always identify as an organization [government] even when it is part of an event name. The name "WordCamp" shows as a location, but it is an event.
Is there a way to train the Natural Language engine or provide a custom set of organizations, locations, events, etc. so that it provides more accurate type information for entities that are not extremely popular?
I am the Product manager for this product. Custom entity types are not currently supported. As per your comment about not getting some entity types right, this is true for any NLP system but our goal is to keep improving. We are working on ways for you to provide us feedback on instances that we get wrong to improve our accuracy and will share the details shortly. Note we have trained our models on multiple data sources and not just Wikipedia data. The API returns the most relevant Wikipedia article for an entity detected so if an entity has multiple interpretations, we will only return the most commonly used interpretation.

Designing a class diagram for a domain model

First, don't think i'm trying to get the job done by someone else, but i'm trying to design a class diagram for a domain model and something I do is probably wrong because I'm stuck, so I just want to get hints about what i'm not doing correctly to continue...
For example, the user needs to search products by categories from a product list. Each category may have subcategories which may have subcategories, etc.
The first diagram I made was this (simplified):
The user also needs to get a tree list of categories which have at least one product.
For example, if this is all the categories tree:
Music instruments
Wind
String
Guitars
Violins
Percussion
Books
Comics
Fiction
Romance
I can't return a tree of Category which have at least one product because I would also get all subCategories, but not each sub category has a product associated to it.
I also can't remove items from the Category.subCategories collection to keep only items which have associated products because it would alter the Category entity, which may be shared elsewhere, this is not what I want.
I thought of doing a copy, but than I would get 2 different instances of the same entity in the same context, isn't it a bad thing ?
So I redesigned to this:
Now I don't get a collection of child categories I don't want with each Category, I only know about its parent category, which is ok.
However, this creates a tree of categories which is navigable only from the bottom to the top, it makes no sense for the client of ProductList who will always need a top -> bottom navigation of categories.
As a solution I think of the diagram below, but i'm not sure it is very good because it kinda dupplicates things, also the CategoryTreeItem does not seems very meaningful in the domain language.
What am I doing wrong ?
This is rather an algorithmic question than a model question. Your first approach is totally ok, unless you were silent about constraints. So you can assign a category or a sub-category to any product. If you assign a sub-category, this means as per this model, the product will also have the parent category. To make it clear I would attach a constraint that tells that a product needs to be assigned to the most finest know category grain. E.g. the guitar products would be assigned to the Guitar category. As more strange instrument like the Stick would get the Strings category (which not would mean its a guitar and a violin but just in the higher category.
Now when you will implement Category you might think of a method to return a collection of assignedInstruments() which for Guitar would return Fender, Alhambra, etc. You might augment this assignedInstruments(levelUp:BOOL) to get also those instruments of the category above.
Generally you must be clear about what the category assignment basically means. If you change the assignment the product will end up in another list.
It depends on the purpose of the diagram. Do you apply a certain software development method that defines the purpose of this diagram in a certain context and the intended readers audience?
Because you talk about a 'domain model', I guess your goal is to provide a kind of conceptual model, i.e. a model of the concepts needed to communicate the application's functionality to end users, testers etc. In that case, the first and the second diagram are both valid, but without the operations (FilterByCategory and GetCategories), because these are not relevant for that audience. The fact that the GUI only displays a subset of the full category tree is usually not expressed in a UML diagram, but in plain text.
On the other hand, if your intention is to provide a technical design for developers, then the third diagram is valid. The developers probably need a class to persist categories in the database ('Category') and a separate class to supply categories to the GUI ('CategoryTreeItem'). You are right that this distinction is not meaningful in the domain language, but in a technical design, it is common to have such additional classes. Please check with the developers if your model is compatible with the programming language and libraries/frameworks they use.
One final remark:
In the first diagram, you specified multiplicity=1 on the parent side. This would mean that every Category has a parent, which is obviously not true. The second diagram has the correct multiplicity: 0..1. The third diagram has an incorrect multiplicity=1 on the composition of CategoryTreeItem.
From my perspective your design is overly complex.
Crafting a domain model around querying needs is usually the wrong approach. Domain models are most useful to express domain behaviors. In other words, to process commands and protect invariants within the correct boundaries.
If your Product Aggregate Root (AR) references a Category AR by id and this relationship is stored in a relationnal DB then you can easily fulfill any of the mentionned querying use cases with a simple DB query. You'd start by gathering a flat representation of the tree which could then be used to construct an in-memory tree.
These queries could be exposed through a ProductQueryService that is part of the application layer, not the domain as those aren't used to enforce domain rules or invariants: I assumed they are used to fullfil reporting or UI display needs. It is there you could have a concept such as ProductCategoryTreeItemDTO for the in-memory representation.
You are also using the wrong terms according to DDD tactical patterns in your diagrams which is very misleading. An AR is an Entity, but an Entity is not necessarily an AR. The Entity term is mostly used to refer to a concept that is uniquely identified within the boundary of it's AR only, but not globally.

Resources