What do I need to know on NLP to be able to use and train Stanford NLP for intent analysis? [closed]

What do I need to know on NLP to be able to use and train Stanford NLP for intent analysis? [closed] - nlp

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Any books, tutorials, course reccommedations would be much appreciated.
I need to know at what level I need to be regarding NLP to be able to comprehend the Stanford NLP and train it to customize it for my app of commercial sentiment analysis.
My goal is not a career in NLP or become an expert in NLP but only to be as much proficient to be able to understand and use the open source NLP frameworks properly and train them for my application.
For this level, what NLP study/training would be needed?
I'm learning c# and .net as well.

First: to simply use a sentiment model or train on existing data, there is not too much background to learn:
Tokenization
Constituency parsing, parse trees, etc.
Basic machine learning concepts (classification, cost functions, training / development sets, etc.)
These are well-documented ideas and are all a Google away. It might be worth it to skim the Coursera Natural Language Processing course (produced by people here at Stanford!) for the above ideas as well.
After that, the significant task is understanding how the RNTN sentiment model inside CoreNLP works. You don't need to grasp the math fully, I suppose, but the basic recursive nature of the algorithm is important to understand. The best resource is of course the original paper (and there's not much else, to be honest).
To train your own sentiment model, you'll need your own sentiment data. Producing this data is no small task. The data for the Stanford sentiment model was crowdsourced, and you may need to do something similar if you want to collect anything near the same scale.
The RNTN sentiment paper (linked above) gives some details on the data format. I'm happy to expand on this further if you do wish to create your own data.

I think you should simply comprehend the concept of supervised learning, unsupervised learning. In addition, some Java knowledge might be useful.

Related

Natural Language Processing books or resource for entry level person? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
Can anyone gives some suggestions for good natural language processing book. Following are the factors I have in mind:
It gives a good overview of these huge topics without too much depth.
Concepts need to explain in picture form.
Sample code in JAVA/Python/R.

You can look at online courses about NLP. They oftain contain videos, exercices, writing documents, suggested readings...
I especially like this one : https://www.coursera.org/course/nlp (see suggested readings section for instance). You can access the lectures here : https://class.coursera.org/nlp/lecture (pdf + video + subtitles).

I believe there are three options for you--I wrote one of them so take this with a grain of salt.
1) Natural Language Processing with Python
by Steven Bird et al. http://amzn.com/0596516495. This book covers using the NLP api NLTK and is considered a solid book for intro to NLP. Lots of code, a more academic take on what NLP is and I assume broadly used in undergraduate NLP classes.
2) Natural Language Processing with Java by Richard Reese http://amzn.to/1D0liUY. This covers a range of APIs, including LingPipe below, and introduces NLP concepts and how they are implemented in a range of open source APIs. It is a more shallow dive into NLP but it is a gentler introduction and it covers how a bunch of APIs solve the same problem so it may help you pick what API to use.
3) Natural Language Processing with Java and LingPipe Cookbook by Breck Baldwin (me) and Krishna Dayanidhi http://amzn.to/1MvgHxa. This is meant for industrial programmers and it covers the concepts common in commercial NLP applications. The book is a much deeper dive into evaluation, problem specification, varied technologies that on the face do the same thing. But it expects you to learn from examples (overwhelmingly Twitter data).
All the books have lots of code, one in Python, the other two in Java. Both present mature APIs with a large installed base.
None of the books do much in the way of graphical explanation of what the software is doing.
Good luck

Diagrammatic method to model software components, their interactions & I/O [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I'd like to model software components and their interaction between them, what information is passed, what processes take place in each component(not too detailed) and a clear specification of the input/output of the components.
What i've seen so far in UML is far too abstract and doesn't go into too much detail.
Any suggestions?

Someg guys Design programs on papers as diagrams,
Then pass them to software developer to Contruct.
This appraoach is tried: "Clever guys" do modeling, and pass models to "ordinary" developers to do laborious task. And this not worked.
We like analogies. So many times we make analogy to construction industry where some guys do models-bluprints and other do building-contruction.And we first think that UML or other models diagrams are equivalent to construction industry models-blueprints. But it seems that we are wrong.
To make an analogy with construction industry our blueprints are not
models-diagrams, our blueprints are actually the code we write.
Detailed Paper Models like Cooking Receipes
It is not realistic to design a software system entirely on a paper with detailed models upfront.Software development is iterative and incremental process.
Think of a map maker who make a paper map of city as big as city, since the modeler include every details without any abstraction level.Will it be usefull?
Is Modeling Useless ?
Definitely not. But you should apply it to difficult part of your problem-solution space, not every trival part of them.
So instead of giving every details of system on paper to developers, explore difficult part of problem-solution space with developers face to face using visual diagrams.
In software industry like it or hate it, Source Code is still the
King. And all models are liar until they are implemented and tested

Best Open source / free NLP engine for the job [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Let's say that I have a pull (a list) of well known phrases, like:
{ "I love you", "Your mother is is a ...", "I think I am pregnant" ... } Let's say about a 1000 like these. And now I want the users to enter free text into a text box, and put some kind of NLP engine to digest the text and find the 10 most relevant phrases from the pull that may be related in a way to the text.
I thought that the simplest implementation could be looking by the words. Picking each time one word and looking for similarities in some way. Not sure which?
What most frightens me is the size of a vocabulary that I must support. I am a single developer of some kind of a demo, and I don't like the idea of filling in words into a table...
I am looking for a free NLP engine. I am agnostic about the language it's written in, but it must be free - NOT some kind of an online service that charges by API calls..

It seems that TextBlob and ConeptNet are more than adequate solution to this problem!

TextBlob is an easy-to-use NLP library for Python that is free and open source (licensed under the permissive MIT License). It provides a nice wrapper around the excellent NLTK and pattern libraries.
One simple approach to your problem would be to extract noun phrases from your given text.
Here's an example from the TextBlob docs.
from text.blob import TextBlob
text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.
Snide comparisons to gelatin be damned, it's a concept with the most
devastating of potential consequences, not unlike the grey goo scenario
proposed by technological theorists fearful of
artificial intelligence run rampant.
'''
blob = TextBlob(text)
print(blob.noun_phrases)
# => ['titular threat', 'blob', 'ultimate movie monster', ...]
This could be a starting point. From there you could experiment with other methods, such as similarity methods as mentioned in the comments or TF-IDF. TextBlob also makes it easy to swap models for noun phrase extraction.
Full disclosure: I am the author of TextBlob.

Ease of use: Stanford CoreNLP vs. OpenNLP [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I looking to use a suite of NLP tools for a personal project, and I was wondering whether Stanford's CoreNLP is easier to use or OpenNLP. Or is there another free package you would reccomend?
I haven't really done any NLP before, so I am looking for something that I can quickly use to learn the concepts and prototype my ideas. Any help is appreciated.

My opinion on which is easier to use is biased, but regarding Ivan Akcheurov's answer, we only released Stanford CoreNLP in Oct 2010, so it isn't very old. Regarding his suggestions, it seems to depend on whether you want to be using a higher-level processing framework or actual processing tools. E.g., if you poke around Knime, it appears that the only NLP components included are actually OpenNLP ones, and most of the machine learning is wrapping Weka.... For groups of individual tools that work together, Stanford NLP, OpenNLP, NLTK, and Lingpipe are perhaps the main choices.

I suggest you GATE (gate.ac.uk):
GATE
Language: Java
Has UIMA support integrartion
Documentation: Super great documented! Movie tutorials and Training Course
Has GUI
Ability to use WordNet, Lucene, Google, Yahoo, Google Translate, Weka
Has some parts of LingPipe and OpenNLP as a plugin
OpenNLP
Language: Java
SharpNLP (its C-Sharp port)
Has UIMA support integrartion
LingPipe
Language: Java
Documentation: Free book tutorials
NLTK
Language: Python
Documentation: an excellent free book
Corpora: Provides dozen of corpora data (~ 850 MB) and lexicons such as wordnet etc.

I suggest you Stanford as it provides the multiple things under one package that is opensource also e.g. Stanford CoreNLP has
StanFord Parser.
Stanford POS Tagger.
Stanford Named Entity Recognition.
Stanford Typed Dependencies. etc.
So in short under one umbrella you get multiple Solutions....

About "AUTOMATIC TEXT SUMMARIZER (lingustic based)" [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am having "AUTOMATIC TEXT SUMMARIZER (linguistic approach)" as my final year project. I have collected enough research papers and gone through them. Still I am not very clear about the 'how-to-go-for-it' thing. Basically I found "AUTOMATIC TEXT SUMMARIZER (statistical based)" and found that it is much easier compared to my project. My project guide told me not to opt this (statistical based) and to go for linguistic based.
Anyone who has ever worked upon or even heard of this sort of project would be knowing that summarizing any document means nothing but SCORING each sentence (by some approach involving some specific algos) and then selecting sentences having score more than threshold score. Now the most difficult part of this project is choosing the appropriate algorithm for scoring and later implementing it.
I have moderate programming skills and would like to code in JAVA (because there I'll get lots of APIs resulting in lesser overheads). Now I want to know that for my project, what should be my approach and algos used. Also how to implement them.

Using Lexical Chains for Text Summarization (Microsoft Research)
An analysis of different algorithms: DasMartins.2007
Most important part in the doc:
• Nenkova (2005) analyzes that no system
could beat the baseline with statistical
significance
• Striking result!
Note there are 2 different nuances to the liguistic approach:
Linguistic rating system (all clear here)
Linguistic generation (rewrites sentences to build the summary)

Automatic Summarization is a pretty complex area - try to get your java skills first in order as well as your understanding of statistical NLP which uses machine learning. You can then work through building something of substance. Evaluate your solution and make sure you have concretely defined your measurement variables and how you went about your evaluation. Otherwise, your project is doomed to failure. This is generally considered a high risk project for final year undergraduate students as they often are unable to get the principles right and then implement it in a way that is not right either and then their evaluation measures are all ill defined and don't reflect on their own work clearly. My advice would be to focus on one area rather then many in summarization as you can have single and multi document summaries. The more varied you make your project the less likely hold of you receiving a good mark. Keep it focused and in depth. Evaluate other peoples work then the process you decided to take and outcomes of that.
Readings:
-Jurafsky book on NLP there is a back section on summarization and QA.
-Advances in Text Summarization by inderjeet mani is really good
Understand what things like term weighting, centroid based summarization, log-likelihood ratio, coherence relations, sentence simplification, maximum marginal relevance, redundancy, and what a focused summary actually is.
You can attempt it using a supervised or an unsupervised approach as well as a hybrid.
Linguistic is a safer option that is why you have been advised to take that approach.
Try attempting it linguistically then build statistical on to hybridize your solution.
Use it as an exercise to learn the theory and practical implication of the algorithms as well as build on your knowledge. As you will no doubt have to explain and defend your project to the judging panel.

If you really have read those research papers and research books you probably know what is known. Now it is up to you to implement the knowledge of those research papers and research books in a Java application. Or you could expand the human knowledge by doing some innovation/invention. If you do expand human knowledge you have become a true scientist.

Please make your question more specific, in these two main areas:
Project definition: What is the goal of your project?
Is the input unit a single document? A list of documents?
Do you intend your program to use machine learning?
What is the output?
How will you measure success?
Your background knowledge: You intend to use linguistic rather than statistical methods.
Do you have background in parsing natural language? In semantic representation?
I think some of these questions are tough. I am asking them because I spent too much time trying to answer similar questions in the course of my studies. Once you get these sorted out, I may be able to give you some pointers. Mani's "Automatic Summarization" looks like a good start, at least the introductory chapters.

The University of Sheffield did some work on automatic email summarising as part of the EU FASiL project a few years back.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string