Logical Semantics, Information Extraction and Summarization - nlp

I want to know a general idea about these questions, in the field of data analysis and NLP.
What are the steps included? If I want to retrieve the meaningful information from any domain-specific text and understand the general idea of any text.
Another question, the larger the size of the analyzed text the better is the result?
Excuse my ignorance. I want to understand more and it would help me a lot if you suggested some tutorials or readings.

I suggest "Speech and Language Processing" by Daniel Jurafsky & James H. Martin. The last chapters are about Information Extraction and Summarization.
As for your question about the text size, it depends. From my experience, Information Extraction works better with short sentences. However, you will need a large data set to train your system in recognizing relevant patterns.

Related

Natural Language Processing(syntatctic,semantic,progmatic) Analysis

My text contains text="Ravi beated Ragu"
My Question will be "Who beated Ragu?"
The Answer Should come "Ravi" Using NLP
How to do this by natural language processing.
Kindly guide me to proceed with this by syntactic,semantic and progmatic analysis using python
I would suggest that you should read an introductory book on NLP to be familiar with the chain processes you are trying to achieve. You are trying to do question-answering , aren't you? If it is the case, you should read about question-answering systems. The above sentence has to be morphologically analyzed (so read about morphological analyzers), syntactically parsed (so read about syntactic parsing) and semantically understood (so read about anaphora resolution and , in linguistics, theta theory). Ravi is called agent and Ragu is called patient or experiencer. Only then, you can proceed to pursue your objectives.
I hope this helps you!
You need lots of data.
What you are looking for might be reading comprehension, thus I recommend you read the papers(for instance: Reading Comprehension on the SQuAD Dataset) by the competitors from SQuAD.
For simple tasks(implying limited generalization) and if you don't have such large size corpus, you can use Regex by manually writing some rules.

Simple toolkits for emotion (sentiment) analysis (not using machine learning)

I am looking for a tool that can analyze the emotion of short texts. I searched for a week and I couldn't find a good one that is publicly available. The ideal tool is one that takes a short text as input and guesses the emotion. It is preferably a standalone application or library.
I don't need tools that is trained by texts. And although similar questions are asked before no satisfactory answers are got.
I searched the Internet and read some papers but I can't find a good tool I want. Currently I found SentiStrength, but the accuracy is not good. I am using emotional dictionaries right now. I felt that some syntax parsing may be necessary but it's too complex for me to build one. Furthermore, it's researched by some people and I don't want to reinvent the wheels. Does anyone know such publicly/research available software? I need a tool that doesn't need training before using.
Thanks in advance.
I think that you will not find a more accurate program than SentiStrength (or SoCal) for this task - other than machine learning methods in a specific narrow domain. If you have a lot (>1000) of hand-coded data for a specific domain then you might like to try a generic machine learning approach based on your data. If not, then I would stop looking for anything better ;)
Identifying entities and extracting precise information from short texts, let alone sentiment, is a very challenging problem specially with short text because of lack of context. Hovewer, there are few unsupervised approaches to extracting sentiments from texts mainly proposed by Turney (2000). Look at that and may be you can adopt the method of extracting sentiments based on adjectives in the short text for your use-case. It is hovewer important to note that this might require you to efficiently POSTag your short text accordingly.
Maybe EmoLib could be of help.

NLP: Language Analysis Techniques and Algorithms

Situation:
I wish to perform a Deep-level Analysis of a given text, which would mean:
Ability to extract keywords and assign importance levels based on contextual usage.
Ability to draw conclusions on the mood expressed.
Ability to hint on the education level (word does this a little bit though, but something more automated)
Ability to mix-and match phrases and find out certain communication patterns
Ability to draw substantial meaning out of it, so that it can be quantified and can be processed for answering by a machine.
Question:
What kind of algorithms and techniques need to be employed for this?
Is there a software that can help me in doing this?
When you figure out how to do this please contact DARPA, the CIA, the FBI, and all other U.S. intelligence agencies. Contracts for projects like these are items of current research worth many millions in research grants. ;)
That being said you'll need to process it in layers and analyze at each of those layers. For items 2 and 3 you'll find training an SVM on n-tuples (try, 3) words will help. For 1 and 4 you'll want deeper analysis. Use a tool like NLTK, or one of the many other parsers and find the subject words in sentences and related words. Also use WordNet (from Princeton)
to find the most common senses used and take those as key words.
5 is extremely challenging, I think intelligent use of the data above can give you what you want, but you'll need to use all your grammatical knowledge and programming knowledge, and it will still be very rough grained.
It sounds like you might be open to some experimentation, in which case a toolkit approach might be best? If so, look at the NLTK Natural Language Toolkit for Python. Open source under the Apache license, and there are a couple of excellent books about it (including one from O'Reilly which is also released online under a creative commons license).

Evaluate the content of a paragraph

We are building a database of scientific papers and performing analysis on the abstracts. The goal is to be able to say "Interest in this topic has gone up 20% from last year". I've already tried key word analysis and haven't really liked the results. So now I am trying to move onto phrases and proximity of words to each other and realize I'm am in over my head. Can anyone point me to a better solution to this, or at very least give me a good term to google to learn more?
The language used is python but I don't think that really affects your answer. Thanks in advance for the help.
It is a big subject, but a good introduction to NLP like this can be found with the NLTK toolkit. This is intended for teaching and works with Python - ie. good for dabbling and experimenting. Also there's a very good open source book (also in paper form from O'Reilly) on the NLTK website.
This is just a guess; not sure if this approach will work. If you're looking at phrases and proximity of words, perhaps you can build up a Markov Chain? That way you can get an idea of the frequency of certain phrases/words in relation to others (based on the order of your Markov Chain).
So you build a Markov Chain and frequency distribution for the year 2009. Then you build another one at the end of 2010 and compare the frequencies (of certain phrases and words). You might have to normalize the text though.
Other than that, something that comes to mind is Natural-Language-Processing techniques (there is a lot of literature surrounding the topic!).

NLP: Qualitatively "positive" vs "negative" sentence

I need your help in determining the best approach for analyzing industry-specific sentences (i.e. movie reviews) for "positive" vs "negative". I've seen libraries such as OpenNLP before, but it's too low-level - it just gives me the basic sentence composition; what I need is a higher-level structure:
- hopefully with wordlists
- hopefully trainable on my set of data
Thanks!
What you are looking for is commonly dubbed Sentiment Analysis. Typically, sentiment analysis is not able to handle delicate subtleties, like sarcasm or irony, but it fares pretty well if you throw a large set of data at it.
Sentiment analysis usually needs quite a bit of pre-processing. At least tokenization, sentence boundary detection and part-of-speech tagging. Sometimes, syntactic parsing can be important. Doing it properly is an entire branch of research in computational linguistics, and I wouldn't advise you with coming up with your own solution unless you take your time to study the field first.
OpenNLP has some tools to aid sentiment analysis, but if you want something more serious, you should look into the LingPipe toolkit. It has some built-in SA-functionality and a nice tutorial. And you can train it on your own set of data, but don't think that it is entirely trivial :-).
Googling for the term will probably also give you some resources to work with. If you have any more specific question, just ask, I'm watching the nlp-tag closely ;-)
Some approaches to sentiment analysis use strategies popular on other text classification tasks. The most common being transforming your film review into a word vector, and feeding it into a classifier algorithm as training data. Most popular data mining packages can help you here. You could have a look at this tutorial on sentiment classification illustrating how to do an experiment using the open source RapidMiner toolkit.
Incidentally, there is a good data set made available for research purposes related to detecting opinion on film reviews. It is based on IMDB user reviews, and you can check many related research work on the area and how they use the data set.
Its worth bearing in mind that the effectiveness of these methods can only be judged from a statistical viewpoint, so you can pretty much assume there will be misclassifications and cases where opinion is hard to detect. As already noticed in this thread, detecting things like irony and sarcasm can be very difficult indeed.

Resources