I am working on Opinion Mining Algorithm in which, I am trying to find the polarity of the particular word.
Algorithm states - Search for any other POS categories like Noun, Adjective, Adverb and accumulate their polarity values using SentiWordNet.
I integrated the SentiWordNet in my current system and its working perfectly for determining the polarity of the sentence. But I want a polarity for particular word.
I found one method senti_classifier.synsets_score() which seem to be useful but I am unable to find any documentation related to this.
Can anyone describe the usage of above method or guide me to the documentation site.?
Is there any other way by which I can find the polarity of a particular word?
Thanks in advance
You can use the example code by Petter Törnberg provided on the SentiWordNet site. It calculates the sentiment score of each word in the thesaurus as a weighted average of the scores of its synsets.
Related
How to find the semantic similarity between any two given sentences?
Eg:
what movies did ron howard direct?
movies directed by ron howard.
I know its a hard problem. But, would like to ask the views of experts.
I don't know how to use the Parts of Speech to achieve this.
http://nlp.stanford.edu:8080/parser/index.jsp
Its a broad problem. I would personally go for cosine similarity.
You need to convert your sentences into a vector. For converting the sentence into vector you can consider several rules, like number of occurances, order, synonyms etc. Then taking the cosine distance as mentioned here
You can also explore elasticsearch for finding associated words. You can create your custom analyzers, stemmer, tokenizer, filters(like synonyms) etc. which can be very helpful in finding similar sentences. Elasticsearch also provides more like this query which finds similar documents using the tf-idf scores.
I am trying to get LESK similarity score between all the senses of two words using ws4j as provided on its online ws4j demo in the format word#pos#sense number for all the senses of both the words but I am not able to find how to get result in same format using ws4j library ?? As there is no proper documentation available for ws4j and the demo/sample code for this library only given maximum value only which also don't tell about which senses has got that score..
can anybody help ?
Lets say I have a text in English and there is a word missing somewhere in it.
I have a list of candidate words from a dictionary with no other information. These candidate words are selected by some other, rather inaccurate, algorithm. I would like to use WordNet and the context around the missing word to assign probabilities to candidate words.
There is an obvious ad-hoc way that came to my mind on how to solve this. One way would be to extract "interesting" words surrounding the missing word, calculate semantic similarity with every candidate word according to some metric and assign probabilities to candidate words based on the average score.
However I was unable to find any useful research papers on this problem.
So, what i'm asking is if you're aware of any research (papers) about this problem, how do you find my proposal and do you have a better idea?
You can start from Experiments: Enriching indirect answers. A good article is Semantic web access prediction using WordNed.
I have implemented sentiment analysis using the sentiment analysis module of Lingpipe. I know that they use a Dynamic LR model for this. It just tells me if the test string is a positive sentiment or negative sentiment. What ideas could I use to determine the object for which the sentiment has been expressed?
If the text is categorized as positive sentiment, I would like to get the object for which the sentiment has been expressed - this could be a movie name, product name or others.
Although this question is really old but I would like to answer it for others' benefit.
What you want here is concept level sentiment analysis. For a very basic version, I would recommend following these steps:
Apply sentence splitter. You can either use Lingpipe's Sentence Splitter or the OpenNLP Sentence Detector.
Apply part-of-spech tagging. Again you can either use Lingpipe's POS tagger or OpenNLP POS Tagger.
You then need to identify tokens(s) identified as 'Nouns' by the POS tagger. These token(s) have the potential of being the targeted entity in the sentence.
Then you need to find sentiment words in the sentence. The easiest way to do this is by using a dictionary of sentiment bearing words. You can find many such dictionaries online.
The next step will be find out dependency relations in sentences. This can be achieved by using the Stanford Dependency Parser. For example, if you try out the sentence - "This phone is good." in their online demo, you can see the following 'Typed Dependencies':
det(phone-2, This-1),
nsubj(good-4, phone-2),
cop(good-4, is-3),
root(ROOT-0, good-4)
The dependency nsubj(good-4, phone-2) here indicates that phone is the nominal subject of the token good, implying that the word good is expressed for phone. I am sure that your sentiment dictionary will contain the word good and phone would have been identified as a noun by the POS tagger. Thus, you can conclude that the sentiment good was expressed for the entity phone.
This was a very basic example. You can go a step further and create rules around the dependency relations to extract more complex sentiment-entity pairs. You can also assign scores to your sentiment terms and come up with a total score for the sentence depending upon the number of occurrences of sentiment words in that sentence.
Usually sentiment sentence means that the main entity of such sentence is the object of that sentiment. So basic heuristic is to NER and get first object. Otherwise you should use deep parsing NLP toolkits and write some rules to link sentiment to object.
I am embarking upon a NLP project for sentiment analysis.
I have successfully installed NLTK for python (seems like a great piece of software for this). However,I am having trouble understanding how it can be used to accomplish my task.
Here is my task:
I start with one long piece of data (lets say several hundred tweets on the subject of the UK election from their webservice)
I would like to break this up into sentences (or info no longer than 100 or so chars) (I guess i can just do this in python??)
Then to search through all the sentences for specific instances within that sentence e.g. "David Cameron"
Then I would like to check for positive/negative sentiment in each sentence and count them accordingly
NB: I am not really worried too much about accuracy because my data sets are large and also not worried too much about sarcasm.
Here are the troubles I am having:
All the data sets I can find e.g. the corpus movie review data that comes with NLTK arent in webservice format. It looks like this has had some processing done already. As far as I can see the processing (by stanford) was done with WEKA. Is it not possible for NLTK to do all this on its own? Here all the data sets have already been organised into positive/negative already e.g. polarity dataset http://www.cs.cornell.edu/People/pabo/movie-review-data/ How is this done? (to organise the sentences by sentiment, is it definitely WEKA? or something else?)
I am not sure I understand why WEKA and NLTK would be used together. Seems like they do much the same thing. If im processing the data with WEKA first to find sentiment why would I need NLTK? Is it possible to explain why this might be necessary?
I have found a few scripts that get somewhat near this task, but all are using the same pre-processed data. Is it not possible to process this data myself to find sentiment in sentences rather than using the data samples given in the link?
Any help is much appreciated and will save me much hair!
Cheers Ke
The movie review data has already been marked by humans as being positive or negative (the person who made the review gave the movie a rating which is used to determine polarity). These gold standard labels allow you to train a classifier, which you could then use for other movie reviews. You could train a classifier in NLTK with that data, but applying the results to election tweets might be less accurate than randomly guessing positive or negative. Alternatively, you can go through and label a few thousand tweets yourself as positive or negative and use this as your training set.
For a description of using Naive Bayes for sentiment analysis with NLTK: http://streamhacker.com/2010/05/10/text-classification-sentiment-analysis-naive-bayes-classifier/
Then in that code, instead of using the movie corpus, use your own data to calculate word counts (in the word_feats method).
Why dont you use WSD. Use Disambiguation tool to find senses. and use map polarity to the senses instead of word. In this case you will get a bit more accurate results as compared to word index polarity.