Training model to recognize brands as entities - nlp

I'm trying to create a model in LUIS that allow me to detect if a brand (any brand) is mentioned in an utterance. I've tried different approaches but I'm struggling to get it working.
First I have an intent searchBrand with some examples utterances:
'Help me find info about Channel'
'I want to know more about Adidas'
...
What I want is that LUIS recognizes that a brand has been mentioned in the utterance (as an entity).
I believe I have these options:
Use a List Entity: impossible since I would have to fill the list
with every possible brand that exists and, moreover, the user would
have to write the brand exactly as it is, not allowing typos (e.g. ralf
lauren)
Use a ML Entity: I believe this could be the right approach. I've tried the following without success:
Create a ML Entity "brands"
Add a Structure with 1 component "brand"
Add to the component a Descriptor with a list of different brands as an example
Once I label the entities in the utterances, the model recognizes correctly the brands that I added to the Descriptor but it fails to recognize others brands or typos

Another option is a pattern entity. It fits somewhere between the two options you listed. You do need to train it with the patterns, and if the pattern is off at all it will not recognize the entity (and won't recognize the intent either unless you've separately trained it with utterances, which you should). However, it seems like the phrasings in your case would be consistent enough that you could define a few patterns for this, and as you train your bot from endpoint utterances you can add additional patterns as needed. Here is an example:
As I put this together I realized I'm ignoring [help me] and [find], essentially the pattern is "info about {brand}", which may or may not be appropriate depending on your other intents. If you say something different like "Tell me more about Adidas", the intent will be recognized (I trained it with your sample utterances), but the pattern, and therefore entity, will not.
Tutorial on using Patterns in LUIS

I got it working following this:
Create a ML Entity "brands"
Add to the entity a Descriptor with a list of different brands as an example. Remember to normalize the elements in the Descriptor
Add brands to the Descriptor
Label entities as "brands" inside utterances in intent "searchBrands"
Train & test the model
It is very important to normalize everything in LUIS. I had the brands inside the Descriptor capitalized and LUIS couldn't recognize new ones, once I normalized the brands LUIS started suggesting new ones and recognizing more when testing the model

Related

Extract entities without specifying during intent specification

I am using Rasa 2.0 to build an FAQ chatbot, wherein I have a large dataset, and specifying entities while defining intents does not seem efficient to me.
I have the intents and examples defined in nlu.yml and would like to extract entities.
Here is an example of what I want to achieve,
User message -> I want a hospital in Delhi.
Entity -> Delhi, hospital
Is it possible to do so?
Entity detection is not a solved problem. There exist pre-trained models that integrate with Rasa like Duckling and spaCy and while these tools certainly contribute a lot of knowledge, they will make errors. If you're interested in learning more of the background on why these models can certainly fail, you can enjoy this youtube video that explains human name detection.
That's why a popular alternative is to use name-lists. There are lists of cities around the world as well as lists of baby names that you can download that might be used as a rule based alternative. You can configure this in Rasa via the RegexEntityExtractor but if you have namelists with 1000+ items then a FlashTextExtractor might be preferable.
If you've got labelled examples you can also train Rasa itself to recognise the entities. But in order to do this you will to have labels around.
specifying entities while defining intents does not seem efficient to me
Labelling might not be super fun, but it is super effective. Without labelling your received utterances you won't know what intents your users are interested in.
You could use entity annotations in your nlu training data; for example, assuming you have defined building_type and city as entity names:
I want a [hospital]("building_type") in [Delhi]("city").
Alternatively, you could try out these options:
annotate a smaller sample (for example, those entities that are essential for your FAQ assistant)
use the RegexEntityExtractor to write some rules
if you have a list of entities, you can use lookup tables to generate the regular expressions

In DDD/CQRS, should ReadModel act as ViewModel, if not then where belongs responsibility for mapping?

Assume read model ProductCatalogueItem is built from aggregates/write-models, stored separately from write-models, and contains each product available for selling, and has following properties:
basics: product_code, name, price, number_of_available_stock,
documentation: short_description, description,...
product characteristics: weight, length, depth, width, color,...
And, there are two views:
product list containing list/table/grid of available product offers, and the view needs only following basic properties: product_code, name, price, number_of_available_stock,
product details showing all the properties - basics, documentation, product characteristics.
Naturally, there come two ViewModels in mind:
ProductCatalogueListItem containing only basic properties,
ProductCatalogueItemDetails containing all the properties.
Now,.. there two options (I can see).
ViewModels are 1:1 representation of ReadModels
Therefore the are two read models, not one, ProductCatalogueListItem and ProductCatalogueItemDetails. And, the read service will have two methods:
List<ProductCatalogueListItem> searchProducts(FilteringOptions),
ProductCatalogueItemDetails getProductDetails(product_code).
And, controllers return these models directly (or, mapped to dto for transport layer).
The issue here is filtering,.. should read service perform search query on a different read model, than is returned from the method call? Because, ProductCatalogueListItem doesn't have enough information to perform filtering.
ViewModels are another project of ReadModels
The read service will have two methods:
List<ProductCatalogueItem> searchProducts(FilteringOptions),
ProductCatalogueItem getProduct(product_code).
And, the mapping from ReadModels to ViewModels is done by upper layer (probably controller).
There is no issue with filtering,... But, there is another issue, that more data leave domain layer, than is actually needed. And, controllers would grow with more logic. As there might be different controllers for different transport technologies, then mapping code would probably get duplicated in those controllers.
Which approach to organize responsibilities is correct according to DDD/CQRS, or completely something else?
The point is:
should I build two read models, and search using one, then return other?
should I build single read model, which is used, and then mapped to limited view to contain only base information for view?
First of all, you do a wrong assertion:
...read model ProductCatalogueItem is built from aggregates/write-models...
Read model doesn't know of aggregates or anything about write model, you build the read model directly from the database, returning the data needed by the UI.
So, the view model is the read model, and it doesn't touch the write model. That's the reason why CQRS exists: for having a different model, the read model, to optimize the queries for returning the data needed by the client.
Update
I will try to explain myself better:
CQRS is simply splitting one object into two, based on the method types. There are two method types: command (any method that mutates state) and query (any method that returns a value). That's all.
When you apply this pattern to the service boundary of an application, you have a write service and a read service, and so you can scale differently the command and query handling, and you can have also two models.
But CQRS is not having two databases, is not messaging, is not eventual consistency, is not updating read model from write model, is not event sourcing. You can do CQRS wihtout them. I say this because I've seen some misconceptions in your assertions.
That said, the design of the read model is done according to what information the user wants to see in the UI, i.e., the read model is the view model, you have no mapping between them, they both are the same model. You can read about it in the references (3) and (6) bellow. I think this answer to your whole question. What I don't understand is the filtering issue.
Some good references
(1) http://codebetter.com/gregyoung/2010/02/16/cqrs-task-based-uis-event-sourcing-agh/
(2) http://www.cqrs.nu/Faq/command-query-responsibility-segregation
(3) "Implementing Domain Driven Design" book, by Vaughn Vernon. Chapter 4: Architecture, "Command-Query Responsibility Segregation, or CQRS" section
(4) https://kalele.io/really-simple-cqrs/
(5) https://martinfowler.com/bliki/CQRS.html
(6) http://udidahan.com/2009/12/09/clarified-cqrs/
As you already built your read model using data which arrived from one or more services, your problem is now in another space(perhaps MVC) rather in CQRS.
Now assume your read model is a db object and ProductCatalogueListItem and ProductCatalogueItemDetails are 2 view models. When you have a request to serve list of products you will make a query in your read db from read model (ProductCatalog table). May be you make queries for additional filters using additional where clauses. Now where do you put your mapping activities in your code after fetching db objects? Its a personal choice. You don't have to do it on uupper llayer aat aall. When I use dapper I fetch db objects using view models inside generic. So I can directly return result from my service method whose return type would be IEnumerable.
For a detail view I would use the same db object. I know CQRS suggests to have different read models for different views. But question yourself - do you really need another db object for detail view? You will need only an id to get all columns where in the first case you needed some selected columns. So I would design your case with a mixture of your 2 above mentioned methods - have 2 service methods returning 2 different objects but instead of having a 1:1 read model to view model have a single read db object and build 2 different view models from it.

Unlabel automatically flagged entity

When you create Intents and enter their sample utterances in LUIS, the parser will sometimes classify some words as being entities. This is a nice feature when it accurately identifies them, but sometimes it mislabels them.
For example, if you have an entity for statuses of a switch (on/off), constructed as a List with "true" and "false" being the values for which "on" and "off" are synonyms, respectively, then every time you use the words "on" or "off" (which have various meanings, uses and purposes) in an intent's sample utterances, they get labeled as that entity, often inaccurately.
The documentation (https://learn.microsoft.com/en-us/azure/cognitive-services/luis/luis-how-to-add-example-utterances) states that List type entities cannot be removed from utterances. Is there any way to avoid simple words that may be used as synonyms in entities from being matched as entities?
Thanks!
I think the only way to do it is to remove those simple words as synonyms (on, off, etc.) from the List entity synonyms (clicking x next to the synonym). Per the message when you create a List entity, they behave differently than other entity types and are direct matching:
Unlike other entity types, additional values for list entities aren't
discovered during training. This entity type is identified in
utterances by the direct matching of utterance text to the defined
values, rather than learning from context.
You could also use simple entities along with Phrase Lists to help boost the signal to those instances where on/off would be an entity that you would want to capture. Adequately supplied phrase lists to help identify those types of instances would be needed.

Train or Custom Word Entity Types?

I was looking through the documentation and testing Google's Natural Language API and noticed it gets a number of people, events, organizations, and locations incorrect - it appears to be using Wikipedia as a major data source so if it is not in Wikipedia it seems to have trouble identifying the type of various words. Also, if certain words appear in a name (proper noun) it seems to always identify an entity as a certain type which is not always correct.
For instance: "Congress" seems to always identify as an organization [government] even when it is part of an event name. The name "WordCamp" shows as a location, but it is an event.
Is there a way to train the Natural Language engine or provide a custom set of organizations, locations, events, etc. so that it provides more accurate type information for entities that are not extremely popular?
I am the Product manager for this product. Custom entity types are not currently supported. As per your comment about not getting some entity types right, this is true for any NLP system but our goal is to keep improving. We are working on ways for you to provide us feedback on instances that we get wrong to improve our accuracy and will share the details shortly. Note we have trained our models on multiple data sources and not just Wikipedia data. The API returns the most relevant Wikipedia article for an entity detected so if an entity has multiple interpretations, we will only return the most commonly used interpretation.

Core Data Inheritance and Relationships

I´m a little confused about inheritance and relationships in core data, and I was hopping someone could drive to the right path. In my app i have created 3 entities, and none of them have (and are not suppose to have) common properties, but there´s gonna be a save and a load button for all the work that the user does. From my understanding I need to "wrap" all the entities "work" into an object which will be used to save and load, and my question is, do I need to create relationships between the entities? Because I have to relate them somehow and this is what make sense to me. Is my logic correct?
I'm implementing a budget calculator, and for the purpose of everyone understand what my issue is, I´m going to give an practical example and please correct me if my logic is incorrect:
Let´s just say you are a fruit seller, and because of that it´s normal to have a database of clients and also a fruit database with the kinds of fruit you sell. From my understanding I find two entities here:
Client with properties named: name, address, phone, email, etc.
Stock with properties named: name, weight, stock, cost, supplier, etc.
TheBudget with properties named: name, amount, type, cost, delivery, etc.
I didn´t put all the properties because I think you get the point. I mean as you can see, there´s only two properties I could inherit; the rest is different. So, if I was doing a budget for a client, I can have as many clients I want and also the amount of stock, but what about the actual budget?
I´m sorry if my explanation was not very clear, but if it was..what kind of relationships should I be creating? I think Client and TheBudget have a connection. What do you advise me?
That's not entirely correct, but some parts are on the right track. I've broken your question down into three parts: relationships, inheritance and the Managed Object Context to hopefully help you understand each part separately:
Relationships
Relationships are usually used to indicate that one entity can 'belong' to another (i.e. an employee can belong to a company). You can setup multiple one-to-many relationships (i.e. an employee belongs to a company and a boss) and you can setup the inverse relationships (which is better described with the word 'owns' or 'has', such as 'one company has many employees).
There are many even more complicated relationships depending on your needs and a whole set of delete rules that you can tell the system to follow when an entity in a relationship is deleted. When first starting out I found it easiest to stick with one-to-one and one-to-many relationships like I've described above.
Inheritance
Inheritance is best described as a sort of base template that is used for other, more specific entities. You are correct in stating that you could use inheritance as a sort of protocol to define some basic attributes that are common across a number of entities. A good example of this would be having a base class 'Employee' with attributes 'name', 'address' and 'start date'. You could then create other entities that inherit from this Employee entity, such as 'Marketing Rep', 'HR', 'Sales Rep', etc. which all have the common attributes 'name', 'address' and 'start date' without creating those attributes on each individual entity. Then, if you wanted to update your model and add, delete or modify a common attribute, you could do so on the parent entity and all of its children will inherit those changes automatically.
Managed Object Context (i.e. saving)
Now, onto the other part of your question/statement: wrapping all of your entities into an object which will be used to save and load. You do not need to create this object, core data uses the NSManagedObjectContext (MOC for short) specifically for this purpose. The MOC is tasked with keeping track of objects you create, delete and modify. In order to save your changes, you simply call the save: method on your MOC.
If you post your entities and what they do, I might be able to help make suggestions on ways to set it up in core data. You want to do your best to setup as robust a core data model as you can during the initial development process. The OS needs to be able to 'upgrade' the backing store to incorporate any changes you've made between your core data model revisions. If you do a poor job of setting up your core data model initially and release your code that way, it can be very difficult to try and make a complicated model update when the app is in the wild (as you've probably guessed, this is advice born out of painful experience :)

Resources