In the WordNet synsets, there are a bunch of concepts such as hyponyms, hypernys, holonyms and meronyms in the NLTK library.
Can someone explain these terms and maybe provide a few examples for this?
Wordnet is well-documented (there is even a book). This is perhaps the docs page that most directly answers your questions:
https://wordnet.princeton.edu/documentation/wngloss7wn
Also, the terms are from linguistics, not invented by the WordNet team. So dictionaries will be useful. Or you can even get meta about it, and look them up in Wordnet :-)
Related
This is not a code question, but about concepts. I want to know who are the main author/researches for Information Extraction, Natural Language Processing and Text Mining to read his papers/books/works.
You will find very good references on Quora under:
What are the most important research papers which all NLP students should definitely read?
While not a definitive list, the ACL Anthology Network has a list of rankings that give you a sense of what papers are frequently cited in Computational Linguistics.
For me, Daniel Jurafsky, Christopher Manning and Tom Mitchell.
Stanford is offering an online class on natural language processing. Visit http://www.nlp-class.org/
Look at The Handbook of Data Mining - Nong Ye for a collection of many papers. This should also point you to the key researchers in text/data mining.
http://www.amazon.com/Handbook-Mining-Human-Factors-Ergonomics/dp/0805855637/ref=sr_1_1?s=books&ie=UTF8&qid=1328297313&sr=1-1
I for the record own this book.
Which tools would you recommend to look into for semantic analysis of text?
Here is my problem: I have a corpus of words (keywords, tags).
I need to process sentences, input by users and find if they are semantically close to words in the corpus that I have.
Any kind of suggestions (books or actual toolkits / APIs) are very welcome.
Regards,
Some useful links to begin with:
http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html
http://kmandcomputing.blogspot.com/2008/06/opinion-mining-with-rapidminer-quick.html
http://rapid-i.com/content/blogcategory/38/69/
http://www.cs.cornell.edu/People/pabo/movie-review-data/otherexperiments.html
http://wordnet.princeton.edu/
Tools/Libraries:
Open NLP
lingpipe
If you consider your corpus as an ontology, Apache Stanbol - http://incubator.apache.org/stanbol/ - might be useful. It uses dbpedia as the default ontology while analyzing text. Although it is incubating, enhancer component is good enough foe adoption. So, you can give it a try.
You can try some WordNet similarity measurements. Ted Pedersen has a compilation of those metrics in WordNet::Similarity which you can experiment and look into. There are counterpart implementations in other languages (e.g. Java).
Situation:
I wish to perform a Deep-level Analysis of a given text, which would mean:
Ability to extract keywords and assign importance levels based on contextual usage.
Ability to draw conclusions on the mood expressed.
Ability to hint on the education level (word does this a little bit though, but something more automated)
Ability to mix-and match phrases and find out certain communication patterns
Ability to draw substantial meaning out of it, so that it can be quantified and can be processed for answering by a machine.
Question:
What kind of algorithms and techniques need to be employed for this?
Is there a software that can help me in doing this?
When you figure out how to do this please contact DARPA, the CIA, the FBI, and all other U.S. intelligence agencies. Contracts for projects like these are items of current research worth many millions in research grants. ;)
That being said you'll need to process it in layers and analyze at each of those layers. For items 2 and 3 you'll find training an SVM on n-tuples (try, 3) words will help. For 1 and 4 you'll want deeper analysis. Use a tool like NLTK, or one of the many other parsers and find the subject words in sentences and related words. Also use WordNet (from Princeton)
to find the most common senses used and take those as key words.
5 is extremely challenging, I think intelligent use of the data above can give you what you want, but you'll need to use all your grammatical knowledge and programming knowledge, and it will still be very rough grained.
It sounds like you might be open to some experimentation, in which case a toolkit approach might be best? If so, look at the NLTK Natural Language Toolkit for Python. Open source under the Apache license, and there are a couple of excellent books about it (including one from O'Reilly which is also released online under a creative commons license).
I have started working on a project which requires Natural Language Processing. We have do the spell checking as well as mapping sentences to phrases and their synonyms. I first thought of using GATE but i am confused on what to use? I found an interesting post here which got me even more confused.
http://lordpimpington.com/codespeaks/drupal-5.1/?q=node/5
Please help me decide on what suits my purpose the best. I am working a web application which will us this NLP tool as a service.
You didn't really give much info, but try this: http://www.nltk.org/
I don't think NLTK does spell checking (I could be wrong on this), but it can do parts of speech tagging for text input.
For finding/matching synonyms you could use something like WordNet http://wordnet.princeton.edu/
If you're doing something really domain specific: I would recommend coming up with your own ontology for domain specific terms.
If you are using Python you can develop a spell checker with Python Enchant.
NLTK is good for developing Sentiment Analysis system too. I have some prototypes of the same too
Jaggu
If you are using deep learning based models, and if you have sufficient data, you can implement task specific models for any purpose. With the development of deep leaning based languages models, you can used word embedding based models with lexicon resources to obtain synonyms and antonyms. You can also follow the links below to obtain more resources.
https://stanfordnlp.github.io/CoreNLP/
https://www.nltk.org/
https://wordnet.princeton.edu/
I'm looking to extract names and places from very short bursts of text example
"cardinals vs jays in toronto"
" Daniel Nestor and Nenad Zimonjic play Jonas Bjorkman w/ Kevin Ullyett, paris time to be announced"
"jenson button - pole position, brawn-mercedes - monaco".
This data is currently in a MySQL database, and I (pretty much) have a separate record for each athlete, though names are sometimes spelled wrong, etc.
I would like to extract the athletes and locations.
I usually work in PHP, but haven't been able to find a library for entity extraction (and I may want to get deeper into some NLP and ML in the future).
From what I've found, LingPipe and NLTK seem to be the most recommended, but I can't figure out if either will really suit my purpose, or if something else would be better.
I haven't programmed in either Java or Python, so before I start learning new languages, I'm hoping to get some advice on what route I should follow, or other recommendations.
What you're describing is named entity recognition. So I'd recommend checking out the other questions regarding this topic if you haven't already seen them. This looks like the most useful answer to me.
I can't really comment about whether NLTK or LingPipe is best suited for this task although from looking at the answers it looks like there's quite a few other resources written in Java.
One advantage of going with NLTK is that Python is very accessible as a language. The other advantage is that the NLTK book (which is available for free) offers an introduction to both Python and NLTK at the same time, which would be useful for you.