I'm reading a paper that says to use WordNet level 3 because if he used level 5 would be lost a lot, but I can't see how to use these supposed levels. I don't have his codes, so I can't share them, but I can share the paper. Can you guys help me figure out if it's possible and how to do it? enter link description here page 16
"In choosing the conceptual level at which to group the nouns, I face a trade-off between specificity and coverage. For example, if I group into categories at the conceptual level of “dog”, I lose all words that exist only at a more general level, such as “mammal” and “animal”. Figure A3 in the appendix displays the share of verb-noun pairs extracted from ONET tasks that would be lost for this reason at each level of aggregation. Due to the level of generality at which ONET tasks are expressed, I would lose more than a quarter of all verb-noun pairs if I grouped at WordNet level 5, for example. (Levels with higher numbers are more specific.) I therefore use WordNet level 3 for my main results, and re-run my analyses at levels 2, 4, and 5 to check their sensitivity. While the level of aggregation does make some difference, the results for these other levels are qualitatively very similar to my baseline specification."
The way I understood the paper is that the author chooses a fixed 'depth' level (or distance to 'entity' and groups together all more specific concepts. 'dog' would be at level 8 then.
In wordnet, you can find the word and the 'root' word 'entity' like this:
wordnet.synset('dog.n.01').min_depth()
It sounds like "level" is within the WordNet hierarchy.
the ancestors of “dog” include “carnivore”, “mammal”, “vertebrate”, “animal”, and “physical entity”.
"physical entity" is a top-level concept in WordNet; I think the author is putting dog at level 5 so must be counting this as level 0. (Though "entity" is the parent of "physical entity".)
So, there is no explicit "level number" in WordNet entries, but you could get its level in the hierarchy by going up and counting how many hypernyms between it and the top of the tree.
See also Does WordNet have "levels"? (NLP)
Related
Product models and specifications always differ subtly.
For example:
iphone6, iphone7sp
12mm*10mm*8mm, 12*8*8, (L)12mm*(W)8mm*(H)8mm
brand-410B-12, brand-411C-09, brand410B12
So, in common E-commerce search, is there a general method to calculate the model or specification similarity?
is there a general method to calculate the model or specification similarity?
No.
This is a research topic sometimes referred to as "product matching", or more broadly "schema matching". It's a hard problem with no standard approach.
Finding out if two strings refer to the same thing is covered by entity resolution, but that's typically used for things like the names of people or organizations where a small change is more likely to be a typo or meaningless change than an important difference (Example: Ulysses S. Grant vs Ulysses Grant). Because a small change in a model number may or may not be important it's different problem. Specifications make things even more complicated.
Here are some papers you can look at for example approaches:
Synthesizing Products for Online Catalogs - Semantic Scholar
Matching Unstructured Product Offers to Structured Product Descriptions - Microsoft Research
Tailoring entity resolution for matching product offers
I am using Wordnet for finding synonyms of ontology concepts. How can i find choose the appropriate sense for my ontology concept. e.g there is an ontlogy concept "conference" it has following synsets in wordnet
The noun conference has 3 senses (first 3 from tagged texts)
(12) conference -- (a prearranged meeting for consultation or exchange of information or discussion (especially one with a formal agenda))
(2) league, conference -- (an association of sports teams that organizes matches for its members)
(2) conference, group discussion -- (a discussion among participants who have an agreed (serious) topic)
now 1st and 3rd synsets have apprpriate sense for my ontology concept. How can i choose only these two from wordnet?
The technology you're looking for is in the direction of semantic disambiguation / representation.
The most "traditional approach" is Word Sense Disambiguation (WSD), take a look at
https://en.wikipedia.org/wiki/Word-sense_disambiguation
https://stackoverflow.com/questions/tagged/word-sense-disambiguation
Anyone know of some good Word Sense Disambiguation software?
Then comes the next generation of Word Sense induction / Topic modelling / Knowledge representation:
https://en.wikipedia.org/wiki/Word-sense_induction
https://en.wikipedia.org/wiki/Topic_model
https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning
Then comes the most recent hype:
Word embeddings, vector space models, neural nets
Sometimes people skip the semantic representation and goes directly to do text similarity and by comparing pairs of sentences, the differences/similarities before getting to the ultimate aim of the text processing.
Take a look at Normalize ranking score with weights for a list of STS related work.
On the other direction, there's
ontology creation (Cyc, Yago, Freebase, etc.)
semantic web (https://en.wikipedia.org/wiki/Semantic_Web)
semantic lexical resources (WordNet, Open Multilingual WordNet, etc.)
Knowledge base population (http://www.nist.gov/tac/2014/KBP/)
There's also a recent task on ontology induction / expansion:
http://alt.qcri.org/semeval2015/task17/
http://alt.qcri.org/semeval2016/task13/
http://alt.qcri.org/semeval2016/task14/
Depending on the ultimate task, maybe either of the above technology would help.
You can also try Babelfy, which provides Word Sense Disambiguation and Named Entity Disambiguation.
Demo:
http://babelfy.org/
API:
http://babelfy.org/guide
Take a look at this list: 100 Best GitHub: Word-sense Disambiguation
and search by WordNet - there are several appropriate libraries.
I didn't use any of them, but this one seems to be promising, because it is based on classic yet effective idea (namely, Lesk algorithm) upgraded by modern word-embedding methods. Actually, before finding it, I was going to suggest to try almost the same ideas.
Note also that all methods try to find the meaning (WordNet sysnet, in your case) that is most similar to the context of the current word/collocation, so it is crucial to have context of the words you're trying to disambiguate. For example, words can come from some text and most libraries rely on that.
I am using Wordnet for finding synonyms of ontology concepts. How can i find choose the appropriate sense for my ontology concept. e.g there is an ontlogy concept "conference" it has following synsets in wordnet
The noun conference has 3 senses (first 3 from tagged texts)
(12) conference -- (a prearranged meeting for consultation or exchange of information or discussion (especially one with a formal agenda))
(2) league, conference -- (an association of sports teams that organizes matches for its members)
(2) conference, group discussion -- (a discussion among participants who have an agreed (serious) topic)
now 1st and 3rd synsets have apprpriate sense for my ontology concept. How can i choose only these two from wordnet?
The technology you're looking for is in the direction of semantic disambiguation / representation.
The most "traditional approach" is Word Sense Disambiguation (WSD), take a look at
https://en.wikipedia.org/wiki/Word-sense_disambiguation
https://stackoverflow.com/questions/tagged/word-sense-disambiguation
Anyone know of some good Word Sense Disambiguation software?
Then comes the next generation of Word Sense induction / Topic modelling / Knowledge representation:
https://en.wikipedia.org/wiki/Word-sense_induction
https://en.wikipedia.org/wiki/Topic_model
https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning
Then comes the most recent hype:
Word embeddings, vector space models, neural nets
Sometimes people skip the semantic representation and goes directly to do text similarity and by comparing pairs of sentences, the differences/similarities before getting to the ultimate aim of the text processing.
Take a look at Normalize ranking score with weights for a list of STS related work.
On the other direction, there's
ontology creation (Cyc, Yago, Freebase, etc.)
semantic web (https://en.wikipedia.org/wiki/Semantic_Web)
semantic lexical resources (WordNet, Open Multilingual WordNet, etc.)
Knowledge base population (http://www.nist.gov/tac/2014/KBP/)
There's also a recent task on ontology induction / expansion:
http://alt.qcri.org/semeval2015/task17/
http://alt.qcri.org/semeval2016/task13/
http://alt.qcri.org/semeval2016/task14/
Depending on the ultimate task, maybe either of the above technology would help.
You can also try Babelfy, which provides Word Sense Disambiguation and Named Entity Disambiguation.
Demo:
http://babelfy.org/
API:
http://babelfy.org/guide
Take a look at this list: 100 Best GitHub: Word-sense Disambiguation
and search by WordNet - there are several appropriate libraries.
I didn't use any of them, but this one seems to be promising, because it is based on classic yet effective idea (namely, Lesk algorithm) upgraded by modern word-embedding methods. Actually, before finding it, I was going to suggest to try almost the same ideas.
Note also that all methods try to find the meaning (WordNet sysnet, in your case) that is most similar to the context of the current word/collocation, so it is crucial to have context of the words you're trying to disambiguate. For example, words can come from some text and most libraries rely on that.
what is the relation between semantic interoperability and upper ontology?
A couple of points :
You don't mention how it will automatically perform the validaty checking.
I dont agree with this statement
"This tells use that it is actually impossible to define most of English words using English alone"
An Ontolology should do presicely that - the tricky bit is to work out which one to use (there is a list of 18 top level Upper Ontologies at http://www.acutesoftware.com.au/aikif/ontology.html)
You need to clarify exactly what outputs this system is going to produce. I understand the idea of mapping the entities regulations ,etc but what are you going to do with this and how will the information be used.
In wordnet there are number of words classified in noun,adjective,advarb and verb files separately. How can we get the domain of some words or words in paricular domain using wordnet?
For example, suppose i have some words like (bark,dog,cat) and all these terms are related to animal. But how can we get to know this through wordnet? Is there any mechanism for this?
You cannot relate verbs like "bark" to the "animal" cluster directly based on WordNet. You can, however, relate dog, cat, etc. as being different kinds of animals by searching the hypernyms of these terms. WordNet has a tree-structure where any word is-a member of a category. Traveling up this category-tree from any word will eventually lead you to the root of this tree called entity.
Therefore, you can use the notion of the lowest common ancestor (LCA) of two words in this category-tree. If the LCA of two words is animal or a hyponym of animal, then both are related. So, if you start with some prior knowledge (say, "dog is an animal"), then you can add other animals to this cluster by following this algorithm.
To also include terms like "bark", "moo", etc., you will need to employ more complex distance measures. These are metrics that look into different types of tree-based relationships (e.g. the path score or the Wu-Palmer score) or the extent of overlap between the dictionary definitions of the words (e.g. LESK).
For example, the LESK score between "dog" and "bark" is 158, while between "dog" and "catapult" is 39. A high score thus indicates that the words belong to the same (or similar) category.
A good software package (in Java) where such distance measures are provided is the WS4J package. They have an online demo here.