Domain of words in wordnet - nlp

In wordnet there are number of words classified in noun,adjective,advarb and verb files separately. How can we get the domain of some words or words in paricular domain using wordnet?
For example, suppose i have some words like (bark,dog,cat) and all these terms are related to animal. But how can we get to know this through wordnet? Is there any mechanism for this?

You cannot relate verbs like "bark" to the "animal" cluster directly based on WordNet. You can, however, relate dog, cat, etc. as being different kinds of animals by searching the hypernyms of these terms. WordNet has a tree-structure where any word is-a member of a category. Traveling up this category-tree from any word will eventually lead you to the root of this tree called entity.
Therefore, you can use the notion of the lowest common ancestor (LCA) of two words in this category-tree. If the LCA of two words is animal or a hyponym of animal, then both are related. So, if you start with some prior knowledge (say, "dog is an animal"), then you can add other animals to this cluster by following this algorithm.
To also include terms like "bark", "moo", etc., you will need to employ more complex distance measures. These are metrics that look into different types of tree-based relationships (e.g. the path score or the Wu-Palmer score) or the extent of overlap between the dictionary definitions of the words (e.g. LESK).
For example, the LESK score between "dog" and "bark" is 158, while between "dog" and "catapult" is 39. A high score thus indicates that the words belong to the same (or similar) category.
A good software package (in Java) where such distance measures are provided is the WS4J package. They have an online demo here.

Related

Does WordNet have Levels?

I'm reading a paper that says to use WordNet level 3 because if he used level 5 would be lost a lot, but I can't see how to use these supposed levels. I don't have his codes, so I can't share them, but I can share the paper. Can you guys help me figure out if it's possible and how to do it? enter link description here page 16
"In choosing the conceptual level at which to group the nouns, I face a trade-off between specificity and coverage. For example, if I group into categories at the conceptual level of “dog”, I lose all words that exist only at a more general level, such as “mammal” and “animal”. Figure A3 in the appendix displays the share of verb-noun pairs extracted from ONET tasks that would be lost for this reason at each level of aggregation. Due to the level of generality at which ONET tasks are expressed, I would lose more than a quarter of all verb-noun pairs if I grouped at WordNet level 5, for example. (Levels with higher numbers are more specific.) I therefore use WordNet level 3 for my main results, and re-run my analyses at levels 2, 4, and 5 to check their sensitivity. While the level of aggregation does make some difference, the results for these other levels are qualitatively very similar to my baseline specification."
The way I understood the paper is that the author chooses a fixed 'depth' level (or distance to 'entity' and groups together all more specific concepts. 'dog' would be at level 8 then.
In wordnet, you can find the word and the 'root' word 'entity' like this:
wordnet.synset('dog.n.01').min_depth()
It sounds like "level" is within the WordNet hierarchy.
the ancestors of “dog” include “carnivore”, “mammal”, “vertebrate”, “animal”, and “physical entity”.
"physical entity" is a top-level concept in WordNet; I think the author is putting dog at level 5 so must be counting this as level 0. (Though "entity" is the parent of "physical entity".)
So, there is no explicit "level number" in WordNet entries, but you could get its level in the hierarchy by going up and counting how many hypernyms between it and the top of the tree.
See also Does WordNet have "levels"? (NLP)

Where may I find a list of words used to describe relations and relationships?

I'm in a nlp project and there are millions of sentences which contains two entity. I want to find whether two entities have relationships or not in each sentence.
So I want to find a word list like:
['related to','induced by','the treatment of','The effects of','the treatment of','treated with','best for','in response to','approved for','response with','associated with','efficacy of ','in treating','applied to','efficacy in','efficacy and safety','efficacy at','impact on','approved','causing','but none of ','linked to','cause of','associated with','leading to','caused by','the relationship between','responsible for']
I have search github but I can't find it.
What should I do?
As you can see there are a vast number of ways in which a possible semantic relationship between two entities can be lexicalised (i.e. expressed by a word/expression) in language. Furthermore, this will be very dependent on the domain (e.g. politics, healthcare, engineering, astronomy, social sciences, etc, etc, etc). I'm not aware of any "ontology of relations".
By contrast, there will be less variety in the syntactic structures at play (i.e. dependency relations or constituent structure, depending on the syntactic formalism you use). You should be able to identify (many of) these more easily than the actual list of words used (although having a list of word would be very useful). For example, for a given verb, if one entity (noun or noun phrase) is the subject and another entity (noun or noun phrase) the direct object, then that verb is likely to express a relation between the two. The same goes for indirect object, oblique object etc.
You can use a library like spaCy to retrieve the grammatical (dependency) relations between verbs and nominal entities which you can then use to identify semantic relations. For example:
The Moon orbits the Earth.
spaCy dependencies: nsubj(orbits, Moon) obj(orbits, Earth)
semantic relation: orbit(Moon, Earth)
Trump was impeached by Congress.
spaCy dependencies: nsubjpass(impeach, Trump) agent(impeached, by) pobj(by, Congress)
semantic relation: impeach(Congress, Trump)
spaCy also takes care of Named Entity Recognition for you, although it is trained on a specific corpus that may not match your domain. Note that I have used the lemma of the word to represent the relation (not the inflected verb form).
These are just simple examples and the number of configurations will be large and more complex verbal predicates exist (e.g. phrasal verbs), but you can pick up many semantic relations with a few patterns of grammatical dependencies just looking at simple verbs.
This requires a bit of work and I have not provided an implementation, but maybe this will help you make a start...?

How can I determine the exact meaning of a word in a sentence? [duplicate]

I am using Wordnet for finding synonyms of ontology concepts. How can i find choose the appropriate sense for my ontology concept. e.g there is an ontlogy concept "conference" it has following synsets in wordnet
The noun conference has 3 senses (first 3 from tagged texts)
(12) conference -- (a prearranged meeting for consultation or exchange of information or discussion (especially one with a formal agenda))
(2) league, conference -- (an association of sports teams that organizes matches for its members)
(2) conference, group discussion -- (a discussion among participants who have an agreed (serious) topic)
now 1st and 3rd synsets have apprpriate sense for my ontology concept. How can i choose only these two from wordnet?
The technology you're looking for is in the direction of semantic disambiguation / representation.
The most "traditional approach" is Word Sense Disambiguation (WSD), take a look at
https://en.wikipedia.org/wiki/Word-sense_disambiguation
https://stackoverflow.com/questions/tagged/word-sense-disambiguation
Anyone know of some good Word Sense Disambiguation software?
Then comes the next generation of Word Sense induction / Topic modelling / Knowledge representation:
https://en.wikipedia.org/wiki/Word-sense_induction
https://en.wikipedia.org/wiki/Topic_model
https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning
Then comes the most recent hype:
Word embeddings, vector space models, neural nets
Sometimes people skip the semantic representation and goes directly to do text similarity and by comparing pairs of sentences, the differences/similarities before getting to the ultimate aim of the text processing.
Take a look at Normalize ranking score with weights for a list of STS related work.
On the other direction, there's
ontology creation (Cyc, Yago, Freebase, etc.)
semantic web (https://en.wikipedia.org/wiki/Semantic_Web)
semantic lexical resources (WordNet, Open Multilingual WordNet, etc.)
Knowledge base population (http://www.nist.gov/tac/2014/KBP/)
There's also a recent task on ontology induction / expansion:
http://alt.qcri.org/semeval2015/task17/
http://alt.qcri.org/semeval2016/task13/
http://alt.qcri.org/semeval2016/task14/
Depending on the ultimate task, maybe either of the above technology would help.
You can also try Babelfy, which provides Word Sense Disambiguation and Named Entity Disambiguation.
Demo:
http://babelfy.org/
API:
http://babelfy.org/guide
Take a look at this list: 100 Best GitHub: Word-sense Disambiguation
and search by WordNet - there are several appropriate libraries.
I didn't use any of them, but this one seems to be promising, because it is based on classic yet effective idea (namely, Lesk algorithm) upgraded by modern word-embedding methods. Actually, before finding it, I was going to suggest to try almost the same ideas.
Note also that all methods try to find the meaning (WordNet sysnet, in your case) that is most similar to the context of the current word/collocation, so it is crucial to have context of the words you're trying to disambiguate. For example, words can come from some text and most libraries rely on that.

Choosing appropriate sense of a word from wordnet

I am using Wordnet for finding synonyms of ontology concepts. How can i find choose the appropriate sense for my ontology concept. e.g there is an ontlogy concept "conference" it has following synsets in wordnet
The noun conference has 3 senses (first 3 from tagged texts)
(12) conference -- (a prearranged meeting for consultation or exchange of information or discussion (especially one with a formal agenda))
(2) league, conference -- (an association of sports teams that organizes matches for its members)
(2) conference, group discussion -- (a discussion among participants who have an agreed (serious) topic)
now 1st and 3rd synsets have apprpriate sense for my ontology concept. How can i choose only these two from wordnet?
The technology you're looking for is in the direction of semantic disambiguation / representation.
The most "traditional approach" is Word Sense Disambiguation (WSD), take a look at
https://en.wikipedia.org/wiki/Word-sense_disambiguation
https://stackoverflow.com/questions/tagged/word-sense-disambiguation
Anyone know of some good Word Sense Disambiguation software?
Then comes the next generation of Word Sense induction / Topic modelling / Knowledge representation:
https://en.wikipedia.org/wiki/Word-sense_induction
https://en.wikipedia.org/wiki/Topic_model
https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning
Then comes the most recent hype:
Word embeddings, vector space models, neural nets
Sometimes people skip the semantic representation and goes directly to do text similarity and by comparing pairs of sentences, the differences/similarities before getting to the ultimate aim of the text processing.
Take a look at Normalize ranking score with weights for a list of STS related work.
On the other direction, there's
ontology creation (Cyc, Yago, Freebase, etc.)
semantic web (https://en.wikipedia.org/wiki/Semantic_Web)
semantic lexical resources (WordNet, Open Multilingual WordNet, etc.)
Knowledge base population (http://www.nist.gov/tac/2014/KBP/)
There's also a recent task on ontology induction / expansion:
http://alt.qcri.org/semeval2015/task17/
http://alt.qcri.org/semeval2016/task13/
http://alt.qcri.org/semeval2016/task14/
Depending on the ultimate task, maybe either of the above technology would help.
You can also try Babelfy, which provides Word Sense Disambiguation and Named Entity Disambiguation.
Demo:
http://babelfy.org/
API:
http://babelfy.org/guide
Take a look at this list: 100 Best GitHub: Word-sense Disambiguation
and search by WordNet - there are several appropriate libraries.
I didn't use any of them, but this one seems to be promising, because it is based on classic yet effective idea (namely, Lesk algorithm) upgraded by modern word-embedding methods. Actually, before finding it, I was going to suggest to try almost the same ideas.
Note also that all methods try to find the meaning (WordNet sysnet, in your case) that is most similar to the context of the current word/collocation, so it is crucial to have context of the words you're trying to disambiguate. For example, words can come from some text and most libraries rely on that.

ML based domain specific named enitty recognition (NER)?

I need to build a classifier which identifies NEs in a specific domain. So for instance if my domain is Hockey or Football, the classifier should go accept NEs in that domain but NOT all pronouns it sees on web pages. My ultimate goal is to improve text classification through NER.
For people working in this area please suggest me how should I build such a classifier?
thanks!
If all you want is to ignore pronouns, you can run any POS tagger followed by any NER algorithm ( the Stanford package is a popular implementation) and then ignore any named entities which are pronouns. However, the pronouns might refer to named entities, which may or may not turn out to be important for the performance of your classifier. The only way to tell for sure it to try.
A slightly unrelated comment- a NER system trained on domain-specific data (e.g. hockey) is more likely to pick up entities from that domain because it will have seen some of the contexts entities appear in. Depending on the system, it might also pick up entities from other domains (which you do not want, if I understand your question correctly) because of syntax, word shape patterns, etc.
I think something like AutoNER might be useful for this. Essentially, the input to the system is text documents from a particular domain and a list of domain-specific entities that you'd like the system to recognize (like Hockey players in your case).
According to their results in this paper, they perform well on recognizing chemical names and disease names among others.

Resources