Adding additional classes in stanford NLP NER - python-3.x

For stanford NER 3 class model, Location, Person, Organization recognizers are available. Is it possible to add additional classes to this model. For example : Sports as one class to tag sports names.
or if not, is there any model where i can add additional classes.
Note: I didnt exactly mean to add "sports" as a class. I was wondering is there a possibility to add a custom class in that model. If not possible in stanford, is it possible with spacy..

Sports doesn't really fall into the Entity category, as there are a limited number of them, and they are pretty fixed, unlike the names of people or locations, so you can list them all.
I would simply set up a list of sports names, and use string matching to find them.

Related

Extract entities without specifying during intent specification

I am using Rasa 2.0 to build an FAQ chatbot, wherein I have a large dataset, and specifying entities while defining intents does not seem efficient to me.
I have the intents and examples defined in nlu.yml and would like to extract entities.
Here is an example of what I want to achieve,
User message -> I want a hospital in Delhi.
Entity -> Delhi, hospital
Is it possible to do so?
Entity detection is not a solved problem. There exist pre-trained models that integrate with Rasa like Duckling and spaCy and while these tools certainly contribute a lot of knowledge, they will make errors. If you're interested in learning more of the background on why these models can certainly fail, you can enjoy this youtube video that explains human name detection.
That's why a popular alternative is to use name-lists. There are lists of cities around the world as well as lists of baby names that you can download that might be used as a rule based alternative. You can configure this in Rasa via the RegexEntityExtractor but if you have namelists with 1000+ items then a FlashTextExtractor might be preferable.
If you've got labelled examples you can also train Rasa itself to recognise the entities. But in order to do this you will to have labels around.
specifying entities while defining intents does not seem efficient to me
Labelling might not be super fun, but it is super effective. Without labelling your received utterances you won't know what intents your users are interested in.
You could use entity annotations in your nlu training data; for example, assuming you have defined building_type and city as entity names:
I want a [hospital]("building_type") in [Delhi]("city").
Alternatively, you could try out these options:
annotate a smaller sample (for example, those entities that are essential for your FAQ assistant)
use the RegexEntityExtractor to write some rules
if you have a list of entities, you can use lookup tables to generate the regular expressions

Training model to recognize brands as entities

I'm trying to create a model in LUIS that allow me to detect if a brand (any brand) is mentioned in an utterance. I've tried different approaches but I'm struggling to get it working.
First I have an intent searchBrand with some examples utterances:
'Help me find info about Channel'
'I want to know more about Adidas'
...
What I want is that LUIS recognizes that a brand has been mentioned in the utterance (as an entity).
I believe I have these options:
Use a List Entity: impossible since I would have to fill the list
with every possible brand that exists and, moreover, the user would
have to write the brand exactly as it is, not allowing typos (e.g. ralf
lauren)
Use a ML Entity: I believe this could be the right approach. I've tried the following without success:
Create a ML Entity "brands"
Add a Structure with 1 component "brand"
Add to the component a Descriptor with a list of different brands as an example
Once I label the entities in the utterances, the model recognizes correctly the brands that I added to the Descriptor but it fails to recognize others brands or typos
Another option is a pattern entity. It fits somewhere between the two options you listed. You do need to train it with the patterns, and if the pattern is off at all it will not recognize the entity (and won't recognize the intent either unless you've separately trained it with utterances, which you should). However, it seems like the phrasings in your case would be consistent enough that you could define a few patterns for this, and as you train your bot from endpoint utterances you can add additional patterns as needed. Here is an example:
As I put this together I realized I'm ignoring [help me] and [find], essentially the pattern is "info about {brand}", which may or may not be appropriate depending on your other intents. If you say something different like "Tell me more about Adidas", the intent will be recognized (I trained it with your sample utterances), but the pattern, and therefore entity, will not.
Tutorial on using Patterns in LUIS
I got it working following this:
Create a ML Entity "brands"
Add to the entity a Descriptor with a list of different brands as an example. Remember to normalize the elements in the Descriptor
Add brands to the Descriptor
Label entities as "brands" inside utterances in intent "searchBrands"
Train & test the model
It is very important to normalize everything in LUIS. I had the brands inside the Descriptor capitalized and LUIS couldn't recognize new ones, once I normalized the brands LUIS started suggesting new ones and recognizing more when testing the model

Use OpenIE to extract relations given entities

I want to know if it is possible to use OpenIE or if there is an available option with which I can specify the entities instead of OpenIE extracting them from Text. And given the entities it finds relation between them?
Eg. Obama was president of US.
Input - Obama, US
Output - president of
The kbp annotator can extract relations (a fixed set of relations from the KBP competition, things such as "born in"). There is documentation about using the full pipeline here: https://stanfordnlp.github.io/CoreNLP/api.html. One limitation of this is it won't extract general relations, but just the specific KBP ones.
No promises, but down the road we want to integrate a relation extractor into our Python code base, and make it trainable on any relation you want. Though you would have to have training data for your specific relation type.

Conceptualization: generalization or not?

I'm modeling an app which will let users look for real estate properties. So it's going to be a website where users will be able to look for rentals and sales on houses, flats, castles, grounds, shops, parkings, offices. According to that, I'm hesitating in the class diagram. Should I generalize all the type of real estate properties, written above, from the class RealEstateProperty or should I just associate to it a class TypeOfRealEstate, knowing that the type "Ground" for example can be as well a real estate property as the ground of a property like a House or a Castle. Also a parking can be a real estate property as well as a parking of a House.
Anyone has an idea of what's the best way to do that ? Thanks in advance.
It depends of what features of different RealEstates your system has to implement. A class's features include attributes, methods and associations.
If all your potential RealEstates have same features, for example ID, type, price, date and responible agent, and you don't need to firther differenciate among them, than the associated type will do the work. Model RealEstateType as an Enum (or even class, if you expect to add new types) and associate it to a single RealEstate class.
If different RealEstates, on the another extreme, need to have different features, you will need to inherit those from the base abstract class. For example, Ground have an attribute "area", while building has "number of floors". Even methods can be different, or associations.
Following your example, you would like to link Ground to House. This is much cleaner in the second version - just an association between Ground and House class. In one-class version, you would have to link the RealEstate with itself and add spacial restrictopns (very "ugly" design).
In summary, try to think about the features of different RealEstates and make your RealEstate hierarchy based on their differences.
You can end up with a single class or several dozens of them. :) Try to keep this hierarchy as simple as possible (less classes), but enough to mark their different features clarly.

ML based domain specific named enitty recognition (NER)?

I need to build a classifier which identifies NEs in a specific domain. So for instance if my domain is Hockey or Football, the classifier should go accept NEs in that domain but NOT all pronouns it sees on web pages. My ultimate goal is to improve text classification through NER.
For people working in this area please suggest me how should I build such a classifier?
thanks!
If all you want is to ignore pronouns, you can run any POS tagger followed by any NER algorithm ( the Stanford package is a popular implementation) and then ignore any named entities which are pronouns. However, the pronouns might refer to named entities, which may or may not turn out to be important for the performance of your classifier. The only way to tell for sure it to try.
A slightly unrelated comment- a NER system trained on domain-specific data (e.g. hockey) is more likely to pick up entities from that domain because it will have seen some of the contexts entities appear in. Depending on the system, it might also pick up entities from other domains (which you do not want, if I understand your question correctly) because of syntax, word shape patterns, etc.
I think something like AutoNER might be useful for this. Essentially, the input to the system is text documents from a particular domain and a list of domain-specific entities that you'd like the system to recognize (like Hockey players in your case).
According to their results in this paper, they perform well on recognizing chemical names and disease names among others.

Resources