Auto NLP Entity Extraction - google-cloud-nl

I have trained model using Auto Natural Language Processing - Entity extraction. For now I have trained this model to extract single keyword under each entity from text however I want to tag single keyword under two entity to create a hierarchy. Example - For now keyword "Lazada" tagged under "Lazada_Ecommerce" however I want to tag this single keyword under two entities - sub-entity "Lazada" and main entity "Ecommerce". It would be great help if someone suggest if it is possible with Google Auto NLP-Entity Extraction model and how.
Thanks,
Satish Kumar
Data Scientist

Google NLP Entity Extraction does not support entity hierarchies. The result of a prediction includes an array of entities, corresponding to each detected entity in the text.
https://cloud.google.com/automl/docs/reference/rpc/google.cloud.automl.v1#google.cloud.automl.v1.PredictResponse
includes property 'payload' which is an array of:
https://cloud.google.com/automl/docs/reference/rpc/google.cloud.automl.v1#google.cloud.automl.v1.AnnotationPayload
Note: If a "sub-entity" can only have one "main entity", then you could manage entity hierarchies external to the model, i.e., train the model to predict "Lazada" and other sub-entities, and externally identify that "Lazada" and others belong to a main "Ecommerce" category. However, if your entity model could have a "Lazada" entity underneath multiple main entities then your current solution would be appropriate (e.g., "Lazada_Ecommerce", "Lazada_SomeOtherMainEntity", etc.).

Related

Markup for relation extraction task with lstm model

I'm trying to figure out how to properly markup for Relation Extraction task (I'm going to use the lstm model).
At the moment, I figured out that entities are highlighted using the
<e1>, </e1>, <e2> and </e2>
tags. And in a separate column, the class of the relationship is indicated.
But what to do in the case when in the sentence one entity has relations of the same type or different ones at once to two other entities.
An example in the image.
Or when there are four entities in one sentence and two relations are defined.
I have two options. The first is to introduce new tags
<e3>, </e3>, <e4> and </e4>
and do multi-class classification. But I haven't seen it done anywhere. The second option is to make a copy of the proposal and share the relationship in this way.
Can you please tell me how to do this markup?

Issue with Relation Annotation (rel.manual) in Spacy's Prodigy tool

I am trying to build a relation extraction model via spacy's prodigy tool.
NOTE: ner.manual, ner.correct, rel.manual are all recipes provided by prodigy.
(ner.manual, ner.correct) The first step involved annotating and training a NER model that can predict entities (this step is done and model is obtained)
The next step involved annotating the relations between the entities. Now, this step could be done int wo different method.
i. Label the entities and relations all from scratch
ii. Use the trained NER model to predict the entities in the UI tool and make corrections to it if needed (similar to ner.correct) and label the relations between the entities
The issue I am now facing is, whenever I use the trained model in the recipe's loop (rel.manual), there is no entities predicted.
Could someone help me with this??
PS: There is no trailing whitespaces issue, i cross-verified it

How to determine relationship between two entities when there is more than one relation while creating distant supervision training data?

I got the concepts of distant supervision. As for my understanding, the creating training data process is like;
Extract named entities from sentences
Find two entities named "e1" and "e2" from each sentence.
Search these two entities in knowledge base (freebase etc.) to find relationship between them
I got confused at this step. What if there is more than 1 relation between these two entities (e1 and e2) ? If so which relation should I select?
It depends on the model you're training.
Are you learning a model for one kind of relationship and doing bootstrapping? Then only pay attention to that one relationship and drop the others from your DB.
Are you trying to learn a bunch of relationships? Then use the presence or absence of each as a feature in your model. This is how Universals Schemas work.
Here's an image of a feature matrix from the Universal Schema paper:

Spacy 2.0 NER Training

In SpacyV1 it was possible to train the NER model by providing a document and a list of entity annotations in BILOU format.
However it seems as if in V2 training is only possible by providing entity annotation like this (7, 13, 'LOC'), so with enity offsets and entity tag.
Is the old way of providing the list of tokens and another list of entity tags in BILOU format still valid?
From what I gather from the documentation it looks like the nlp.update method accepts a list of GoldParse objects so I could create a GoldParse Object for each doc and pass the BILOU tags to its entities attribute. However would I loose important information by ignoring the other attributes of the GoldParse class (e.g. heads or tags https://spacy.io/api/goldparse ) or are the other attributes not needed for training the NER?
Thanks!
Yes, you can still create GoldParse objects with the BILUO tags. The main reason the usage examples show the "simpler" offset format is that it makes them slightly easier to read and understand.
If you only want to train the NER, you can now also use the nlp.disable_pipes() context manager and disable all other pipeline components (e.g. the 'tagger' and 'parser') during training. After the block, the components will be restored, so when you save out the model, it will include the whole pipeline. You can see this in action in the NER training examples.
How can you train using the GoldParse object? I've being trying for a while I've could not figure out.

How to prepare text for more successful "person" entity type classification when using stanford Named Entity Recognition

I am using Stanford NER classification as part of a PHI De-identification process running on laboratory text notes. I am noticing that in some cases, the classification tags e.g <PERSON></PERSON> tags can find a person name, but then continue to tag much more text either side of the found name. This loss of precision means that we could potentially lose a lot of non-PHI and valuable info. Is there a way to prepare text in such a way that entities are more precisely discovered?

Resources