Can someone give some starting points on how to get started with sentiment analysis.
It would be great if you could provide some open source tools that can be used for that task.
Currently I am looking at GATE (http://gate.ac.uk) and RapidMiner (http://rapid-i.com/), but I think I am in middle of nowhere and I lack the basics to get started with these tools...
It would be helpful if someone who has prior experience with GATE/RapidMiner explained how to start working with these.
both, GATE and RapidMiner are powerful text mining and sentiment analysis tools. I personally prefer RapidMiner, because I found it easier to learn and the RapidMiner training courses provided by Rapid-I gave me a really quick start. They offer a dedicated course on text mining and sentiment analysis:
Sentiment Analysis, Opinion Mining, and Automated Market Research .
Starting in September or October 2009, they will also offer webinars. You should contact them directly, if you would like to learn more about their webinars. Several major online market research companies in Europe and the US are using RapidMiner for opinion mining and sentiment analysis from internet discussions groups and web blogs. For more details and references I would again suggest to simply ask their team at contact(at)rapid-i.com or check their RapidMiner forum at forum.rapid-i.com .
Best regards,
Frank
Another option is to use LingPipe which has a very nice tutorial to jumpstart you in sentiment analysis. LingPipe has an integration for GATE, which you can read more about here.
For GATE specifically, the materials from our regular training courses are available on the wiki. In particular module 12 from the June 2012 course discusses how to use GATE for opinion mining.
Related
I need to translate Spanish tweets into english for my research. I find some toolkit. Among them, Moses is used by some research papers and other emerging toolkits used them as a baseline for evaluation purpose. So i am considering it as a candidate. Also, I found a toolkit from Stanford university called Phrsal, which also seems to be good. The last one I found is from renowned nltk library. It also has a translate package. Every one of them states that they used phrase based statistical machine translation technique along with some other techinques. Now my question is, from a practical or theoretical point of view, which will be best to use for tweets translation. Or google translator api would be the best solution?
I know of the following open source tools, but I haven't found any comparisons of how good they are respectively.
Tools with ready to use phrase extraction:
KEA
MAUI (http://code.google.com/p/maui-indexer/)
Dragon, xTract (http://dragon.ischool.drexel.edu/xtract.asp)
Lingpipe (http://alias-i.com/lingpipe/demos/tutorial/interestingPhrases/read-me.html)
Mahout (https://cwiki.apache.org/MAHOUT/collocations.html)
Anything else
Did anyone ever see such a comparison?
MAUI outperforms KEA on my experiments.
There is a comparison on unsupervised automatic key phrase extraction methods (Coling 2010 paper). But they don't analyse supervised methods, I'm planning to do that in a near future.
In addition, I've also explored a richer set of features which improved the performance of automatic Key Phrase Extraction which is still far from perfect. I might release the extended version of MAUI with those extensions next year.
Please read the following papers or email me more details:
Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization
Keyphrase Cloud Generation of Broadcast News
I like Mallet because it has a command line tool that is really easy to use
I am trying to build an NLP corpus for an under resource language, as there is no data available for the purpose of NLP research. Can any one suggest, how to build or proceed to make it a standard NLP corpus, any standard method or paper or link.
Thanks in advance
I would suggest contacting someone like Fei Xia at the University of Washington who has worked on the Penn Treebank and is kind of an expert of hat, or some of the people at Penn.
Building a full on treebank for parsing and tagging is not a trivial task. What exactly are you trying to do? What's the goal?
-parsing/tagging?
-semantics?
-information extraction?
-phonetics?
Honestly, as per the comments, this sounds like a project for an entire team of linguists.
What approaches are there to generating question from a sentence? Let's say I have a sentence "Jim's dog was very hairy and smelled like wet newspaper" - which toolkit is capable of generating a question like "What did Jim's dog smelled like?" or "How hairy was Jim's dog?"
Thanks!
Unfortunately there isn't one, exactly. There is some code written as part of Michael Heilman's PhD dissertation at CMU; perhaps you'll find it and its corresponding papers interesting?
If it helps, the topic you want information on is called "question generation". This is pretty much the opposite of what Watson does, even though "here is an answer, generate the corresponding question" is exactly how Jeopardy is played. But actually, Watson is a "question answering" system.
In addition to the link to Michael Heilman's PhD provided by dmn, I recommend checking out the following papers:
Automatic Question Generation and Answer Judging: A Q&A Game for Language Learning (Yushi Xu, Anna Goldie, Stephanie Seneff)
Automatic Question Generationg from Sentences (Husam Ali, Yllias Chali, Sadid A. Hasan)
As of 2022, Haystack provides a comprehensive suite of tools to accomplish the purpose of Question generation and answering using the latest and greatest Transformer models and Transfer learning.
From their website,
Haystack is an open-source framework for building search systems that work intelligently over large document collections. Recent advances in NLP have enabled the application of question answering, retrieval and summarization to real world settings and Haystack is designed to be the bridge between research and industry.
NLP for Search: Pick components that perform retrieval, question answering, reranking and much more
Latest models: Utilize all transformer based models (BERT, RoBERTa, MiniLM, DPR) and smoothly switch when new ones get published
Flexible databases: Load data into and query from a range of databases such as Elasticsearch, Milvus, FAISS, SQL and more
Scalability: Scale your system to handle millions of documents and deploy them via REST API
Domain adaptation: All tooling you need to annotate examples, collect user-feedback, evaluate components and finetune models.
Based on my personal experience, I am 95% successful in generating Questions and Answers in my Internship for training purposes. I have a sample web user interface to demonstrate and the code too. My Web App and Code.
Huge shoutout to the developers on the Slack channel for helping noobs in AI like me! Implementing and deploying a NLP model has never been easier if not for Haystack. I believe this is the only tool out there where one can easily develop and deploy.
Disclaimer: I do not work for deepset.ai or Haystack, am just a fan of haystack.
As of 2019, Question generation from text has become possible. There are several research papers for this task.
The current state-of-the-art question generation model uses language modeling with different pretraining objectives. Research paper, code implementation and pre-trained model are available to download on the Paperwithcode website link.
This model can be used to fine-tune on your own dataset (instructions for finetuning are given here).
I would suggest checking out this link for more solutions. I hope it helps.
One simple question (but I haven't quite found an obvious answer in the NLP stuff I've been reading, which I'm very new to):
I want to classify emails with a probability along certain dimensions of mood. Is there an NLP package out there specifically dealing with this? Is there an obvious starting point in the literature I start reading at?
For example, if I got a short email something like "Hi, I'm not very impressed with your last email - you said the order amount would only be $15.95! Regards, Tom" then it might get 8/10 for Frustration and 0/10 for Happiness.
The actual list of moods isn't so important, but a short list of generally positive vs generally negative moods would be useful.
Thanks in advance!
--Trindaz on Fedang #NLP
You can do this with a number of different NLP tools, but nothing to my knowledge comes with it ready out of the box. Perhaps the easiest place to start would be with LingPipe (java), and you can use their very good sentiment analysis tutorial. You could also use NLTK if python is more your bent. There are some good blog posts over at Streamhacker that describe how you would use Naive Bayes to implement that.
Check out AlchemyAPI for sentiment analysis tools and scikit-learn or any other open machine learning library for the classifier.
if you have not decided to code the implementation, you can also have the data classified by some other tool. google prediction api may be an alternative.
Either way, you will need some labeled data and do the preprocessing. But if you use a tool that may help you get better accuracy easily.