rapidminer and sentiment analysis - nlp

Is anyone out there used Rapidminer for sentiment analysis... Is this a right combination???
If not how do I get started with a simple sentiment analysis application??

RapidMiner is a very powerful text mining and sentiment analysis tools. I can recommend the RapidMiner training courses offered by Rapid-I. They gave me a really quick start. They also offer a dedicated course on text mining and sentiment analysis:
Sentiment Analysis, Opinion Mining, and Automated Market Research .
Starting in September or October 2009, they will also offer webinars. You should contact them directly, if you would like to learn more about their webinars. Several major online market research companies in Europe and the US are using RapidMiner for opinion mining and sentiment analysis from internet discussions groups and web blogs. For more details and references I would again suggest to simply ask their team at contact(at)rapid-i.com or check their RapidMiner forum at forum.rapid-i.com .
Best regards,
Frank

This series of videos should help:
http://vancouverdata.blogspot.com/2010/11/text-analytics-with-rapidminer-loading.html

When I go to rapid miner site it is confusing me.
http://rapidminer.com/solutions/sentiment-analysis/
"It looks like a crowd sourcing to identify the polarity of product reviews and discussions around the web." If you are looking to automate in real time this might not be a good solution.
spotdy.com offers free NLP for developers. It works pretty cool.
Most of the Sentiment Analysis software tokenize words and giving a positive and negative factor and sum those up. Since language is contextual, this leads to ignoring the context which is not a right way to do.
Instead deep learning models, HMM based on sentence structure. It computes the sentiment based on how words are composed in a sentence. Check out spotdy.com. It is free.

Related

simple nltk sentiment analysis code using python3

I am trying to do some classification on customer emails.
Is the email happy or sad (sentiment analysis)
Is the email related to billing or not.
I am using Python3 and think I have to use nltk and scikit
NLTK - will help understand and read the text I beleive
scikit - will do the classification (happy, sad and billing or not)
Training data set 1: A few phrases...anywhere from one word to a sentence with 5 to 6 words.
(1 being happy and 0 being not happy)...a few examples below
Apprecaite the help..1
great job..1
Awesome..1
terrible..0
confusing...0
slow down...0
Training data set 2: a few phrases indicating billing related question..(few examples below)
question on my bill
billing fee
my bill is too high
payment rejected
Now this seems to be straight forward from a concept stand point
where can I find some basic code, that will tell me
how I can use my own training data
how I can load the email text as input and spit out an answer happy or sad...and billing or not.
Regarding your data sets, your approach is nearly lexicon-based as the items contains very few words.
For billing, the lexicon-based approach should be a good idea. You should give importance to the subjects of the emails.
For sentiment analysis you have two options:
Machine learning: In this case you should use a bigger data set (in my view, each item should be a full email). You can implement a Naive Bayes classifier following this tutorial.
Lexicon-based approach: There are several lexicons for sentiment analysis e.g. SentiWordNet (downloadable from nltk.download()), MPQA, SentiStrength, WordNet-Affect via WNAffect,... Preprocessings: tokenization (nltk.word_tokenize()) and POS tagging (nltk.pos_tag(text)). You should also think about negation (polarity shifting is a good approach to manage with negation).
Machine Learning provide best results so if you have enough annotated emails it is the good choice.

Entity Recognition and Sentiment Analysis using NLP

So, this question might be a little naive, but I thought asking the friendly people of Stackoverflow wouldn't hurt.
My current company has been using a third party API for NLP for a while now. We basically URL encode a string and send it over, and they extract certain entities for us (we have a list of entities that we're looking for) and return a json mapping of entity : sentiment. We've recently decided to bring this project in house instead.
I've been studying NLTK, Stanford NLP and lingpipe for the past 2 days now, and can't figure out if I'm basically reinventing the wheel doing this project.
We already have massive tables containing the original unstructured text and another table containing the extracted entities from that text and their sentiment. The entities are single words. For example:
Unstructured text : Now for the bed. It wasn't the best.
Entity : Bed
Sentiment : Negative
I believe that implies we have training data (unstructured text) as well as entity and sentiments. Now how I can go about using this training data on one of the NLP frameworks and getting what we want? No clue. I've sort of got the steps, but not sure:
Tokenize sentences
Tokenize words
Find the noun in the sentence (POS tagging)
Find the sentiment of that sentence.
But that should fail for the case I mentioned above since it talks about the bed in 2 different sentences?
So the question - Does any one know what the best framework would be for accomplishing the above tasks, and any tutorials on the same (Note: I'm not asking for a solution). If you've done this stuff before, is this task too large to take on? I've looked up some commercial APIs but they're absurdly expensive to use (we're a tiny startup).
Thanks stackoverflow!
OpenNLP may also library to look at. At least they have a small tutuorial to train the name finder and to use the document categorizer to do sentiment analysis. To trtain the name finder you have to prepare training data by taging the entities in your text with SGML tags.
http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind.training
NLTK provides a naive NER tagger along with resources. But It doesnt fit into all cases (including finding dates.) But NLTK allows you to modify and customize the NER Tagger according to the requirement. This link might give you some ideas with basic examples on how to customize. Also if you are comfortable with scala and functional programming this is one tool you cannot afford to miss.
Cheers...!
I have discovered spaCy lately and it's just great ! In the link you can find comparative for performance in term of speed and accuracy compared to NLTK, CoreNLP and it does really well !
Though to solve your problem task is not a matter of a framework. You can have two different system, one for NER and one for Sentiment and they can be completely independent. The hype these days is to use neural network and if you are willing too, you can train a recurrent neural network (which has showed best performance for NLP tasks) with attention mechanism to find the entity and the sentiment too.
There are great demo everywhere on the internet, the last two I have read and found interesting are [1] and [2].
Similar to Spacy, TextBlob is another fast and easy package that can accomplish many of these tasks.
I use NLTK, Spacy, and Textblob frequently. If the corpus is simple, generic, and straightforward, Spacy and Textblob work well OOTB. If the corpus is highly customized, domain-specific, messy (incorrect spelling or grammar), etc. I'll use NLTK and spend more time customizing my NLP text processing pipeline with scrubbing, lemmatizing, etc.
NLTK Tutorial: http://www.nltk.org/book/
Spacy Quickstart: https://spacy.io/usage/
Textblob Quickstart: http://textblob.readthedocs.io/en/dev/quickstart.html

Sources of classified sentiment data?

I'm looking to train a naive Bayes with some new data sources that haven't been used before. I've already looked at the Lee & Pang corpus of IMDB reviews and the MPQA opinion corpus. I'm looking for new web services that fit the following criteria.
Easily Classified - must have a like/dislike or 5 star rating
Readily available
Pertain to new material (less important than the first two)
Here are some samples I have come up with on my own.
Etsy API
Rotten Tomatoes API
Yelp API
Any other suggestions would be much appreciated =)
In Pang&Lee's later work (2008) "Opinion Mining and Sentiment Analysis" here they have a section for publicly available resources. It has links to those corpora.
Take a look at sentiment140. It has a corpus that you can download and train with. You can easily extend to new tweets.

How to rate tweet comment as positive or negative using Word Sentiment

I want to categorize comments as positive or negative based on the content.
This is a problem of NLP(Natural Lang Processing) and i am finding difficulties in implementing this.
Check out this blog post. The author describes how to build a Twitter Sentiment Classifier with Python and NLTK. Looks like a good start, as sentiment analysis is no easy task with lots of active research going on in the field.
Also search SO for Sentiment Analysis, I believe there already are many useful answers about this topic on the site.
Here is, Combination of Semi Supervised co-occurance based and unsupervised WSD based classifier. Its in Python though. And you need nltk, wordnet, SentiWord-net and movie review corpus which comes with nltk.
https://github.com/kevincobain2000/sentiment_classifier
The problem is quite complex, anyway I love Pattern: http://www.clips.ua.ac.be/pages/pattern-examples-elections
If you are not categorizing a lot of comments you may wish to try using the chatterboax API
Else you can use Linpipe, but you will have to train your models

Simple Sentiment Analysis

It appears that the simplest, naivest way to do basic sentiment analysis is with a Bayesian classifier (confirmed by what I'm finding here on SO). Any counter-arguments or other suggestions?
A Bayesian classifier with a bag of words representation is the simplest statistical method. You can get significantly better results by moving to more advanced classifiers and feature representation, at the cost of more complexity.
Statistical methods aren't the only game in town. Rule based methods that have more understanding of the structure of the text are the other main option. From what I have seen, these don't actually perform as well as statistical methods.
I recommend Manning and Schütze's Foundations of Statistical Natural Language Processing chapter 16, Text Categorization.
I can't think of a simpler, more naive way to do Sentiment Analysis, but you might consider using a Support Vector Machine instead of Naive Bayes (in some machine learning toolkits, this can be a drop-in replacement). Have a look at "Thumbs up? Sentiment Classification using Machine Learning Techniques" by Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan which was one of the earliest papers on these techniques, and gives a good table of accuracy results on a family of related techniques, none of which are any more complicated (from a client perspective) than any of the others.
Building upon the answer provided by Ken above, there is another paper
"Sentiment analysis using support vector machines with diverse information sources" by Tony and Niger,
which looks at assigning more features than just a bag of words used by Pang and Lee. Here, they leverage wordnet to determine semantic differentiation of adjectives, and proximity of the sentiment towards the topic in the text, as additional features for SVM. They show better results than previous attempts to classify text based on sentiment.

Resources