Trying to analyze text and sentiments - nlp

I am trying to analyze text and sentiment data, but I don't want an extensive analysis. I just want a basic distribution of good, neutral and bad, and the percentage of each category.
can anyone direct me with some advice or suggestions?
thank you all!!

WinkNLP can measure sentiment on a scale of -1 to +1 for the entire document or its sentences. Here is an Observable notebook with a live example.

Related

Detecting questions in text

I have a project where I need to analyze a text to extract some information if the user who post this text need help in something or not, I tried to use sentiment analysis but it didn't work as expected, my idea was to get the negative post and extract the main words in the post and suggest to him some articles about that subject, if there is another way that can help me please post it below and thanks.
for the dataset i useed, it was a dataset for sentiment analyze, but now I found that it's not working and I need a dataset use for this subject.
Please use the NLP methods before processing the sentiment analysis. Use the TFIDF, Word2Vector to create vectors on the given dataset. And them try the sentiment analysis. You may also need glove vector for the conducting analysis.
For this topic I found that this field in machine learning is called "Natural Language Questions" it's a field where machine learning models trained to detect questions in text and suggesting answer for them based on data set you are working with, check this article for more detail.

How to train and test data for classification using Machine learning algorithms

I have collected tweets from Twitter API. The tweets are not labelled and I have no clue how to start with? All the tutorials have already labelled data. How to label data? Can labelling be done manually only? Any good tutorial answering my queries will be of great help.
I assume that when you extract the data from Twitter API, it's in the JSON format. Use the key, value pair as your dataframe heading and values.Now for the label part,it depends on what are you going with the dataset. If you want to do sentiment analysis then you need to manually mark the dataset(or just download pre-labeled twitter dataset from internet).
For reference here is a great tutorial on how to mine and deal with the raw data, getting insight and applying clustering algorithms. Hope it helps !

What is an appropriate training set size for text classification (Sentiment analysis)

I just wanted to understand (from your experience), that if I have to create a sentiment analysis classification model (using NLTK), what would be a good training data size. For instance if my training data is going to contain tweets, and I intend to classify them as positive,negative and neutral, how many tweets each should I ideally have per category to get a reasonable model working?
I understand that there are many parameters like quality of data, but if one has to get started what might be a good number.
That's a really hard question to answer for people who are not familiar with the exact data, its labelling and the application you want to use it for. But as a ballpark estimate, I would say start with 1,000 examples of each and go from there.

How to rate tweet comment as positive or negative using Word Sentiment

I want to categorize comments as positive or negative based on the content.
This is a problem of NLP(Natural Lang Processing) and i am finding difficulties in implementing this.
Check out this blog post. The author describes how to build a Twitter Sentiment Classifier with Python and NLTK. Looks like a good start, as sentiment analysis is no easy task with lots of active research going on in the field.
Also search SO for Sentiment Analysis, I believe there already are many useful answers about this topic on the site.
Here is, Combination of Semi Supervised co-occurance based and unsupervised WSD based classifier. Its in Python though. And you need nltk, wordnet, SentiWord-net and movie review corpus which comes with nltk.
https://github.com/kevincobain2000/sentiment_classifier
The problem is quite complex, anyway I love Pattern: http://www.clips.ua.ac.be/pages/pattern-examples-elections
If you are not categorizing a lot of comments you may wish to try using the chatterboax API
Else you can use Linpipe, but you will have to train your models

Sentiment analysis with NLTK python for sentences using sample data or webservice?

I am embarking upon a NLP project for sentiment analysis.
I have successfully installed NLTK for python (seems like a great piece of software for this). However,I am having trouble understanding how it can be used to accomplish my task.
Here is my task:
I start with one long piece of data (lets say several hundred tweets on the subject of the UK election from their webservice)
I would like to break this up into sentences (or info no longer than 100 or so chars) (I guess i can just do this in python??)
Then to search through all the sentences for specific instances within that sentence e.g. "David Cameron"
Then I would like to check for positive/negative sentiment in each sentence and count them accordingly
NB: I am not really worried too much about accuracy because my data sets are large and also not worried too much about sarcasm.
Here are the troubles I am having:
All the data sets I can find e.g. the corpus movie review data that comes with NLTK arent in webservice format. It looks like this has had some processing done already. As far as I can see the processing (by stanford) was done with WEKA. Is it not possible for NLTK to do all this on its own? Here all the data sets have already been organised into positive/negative already e.g. polarity dataset http://www.cs.cornell.edu/People/pabo/movie-review-data/ How is this done? (to organise the sentences by sentiment, is it definitely WEKA? or something else?)
I am not sure I understand why WEKA and NLTK would be used together. Seems like they do much the same thing. If im processing the data with WEKA first to find sentiment why would I need NLTK? Is it possible to explain why this might be necessary?
I have found a few scripts that get somewhat near this task, but all are using the same pre-processed data. Is it not possible to process this data myself to find sentiment in sentences rather than using the data samples given in the link?
Any help is much appreciated and will save me much hair!
Cheers Ke
The movie review data has already been marked by humans as being positive or negative (the person who made the review gave the movie a rating which is used to determine polarity). These gold standard labels allow you to train a classifier, which you could then use for other movie reviews. You could train a classifier in NLTK with that data, but applying the results to election tweets might be less accurate than randomly guessing positive or negative. Alternatively, you can go through and label a few thousand tweets yourself as positive or negative and use this as your training set.
For a description of using Naive Bayes for sentiment analysis with NLTK: http://streamhacker.com/2010/05/10/text-classification-sentiment-analysis-naive-bayes-classifier/
Then in that code, instead of using the movie corpus, use your own data to calculate word counts (in the word_feats method).
Why dont you use WSD. Use Disambiguation tool to find senses. and use map polarity to the senses instead of word. In this case you will get a bit more accurate results as compared to word index polarity.

Resources