Natural Language Processing books or resource for entry level person? [closed] - nlp

Can anyone gives some suggestions for good natural language processing book. Following are the factors I have in mind:
It gives a good overview of these huge topics without too much depth.
Concepts need to explain in picture form.
Sample code in JAVA/Python/R.

You can look at online courses about NLP. They oftain contain videos, exercices, writing documents, suggested readings...
I especially like this one : (see suggested readings section for instance). You can access the lectures here : (pdf + video + subtitles).

I believe there are three options for you--I wrote one of them so take this with a grain of salt.
1) Natural Language Processing with Python
by Steven Bird et al. This book covers using the NLP api NLTK and is considered a solid book for intro to NLP. Lots of code, a more academic take on what NLP is and I assume broadly used in undergraduate NLP classes.
2) Natural Language Processing with Java by Richard Reese This covers a range of APIs, including LingPipe below, and introduces NLP concepts and how they are implemented in a range of open source APIs. It is a more shallow dive into NLP but it is a gentler introduction and it covers how a bunch of APIs solve the same problem so it may help you pick what API to use.
3) Natural Language Processing with Java and LingPipe Cookbook by Breck Baldwin (me) and Krishna Dayanidhi This is meant for industrial programmers and it covers the concepts common in commercial NLP applications. The book is a much deeper dive into evaluation, problem specification, varied technologies that on the face do the same thing. But it expects you to learn from examples (overwhelmingly Twitter data).
All the books have lots of code, one in Python, the other two in Java. Both present mature APIs with a large installed base.
None of the books do much in the way of graphical explanation of what the software is doing.
Good luck


What do I need to know on NLP to be able to use and train Stanford NLP for intent analysis? [closed]

Any books, tutorials, course reccommedations would be much appreciated.
I need to know at what level I need to be regarding NLP to be able to comprehend the Stanford NLP and train it to customize it for my app of commercial sentiment analysis.
My goal is not a career in NLP or become an expert in NLP but only to be as much proficient to be able to understand and use the open source NLP frameworks properly and train them for my application.
For this level, what NLP study/training would be needed?
I'm learning c# and .net as well.
First: to simply use a sentiment model or train on existing data, there is not too much background to learn:
Constituency parsing, parse trees, etc.
Basic machine learning concepts (classification, cost functions, training / development sets, etc.)
These are well-documented ideas and are all a Google away. It might be worth it to skim the Coursera Natural Language Processing course (produced by people here at Stanford!) for the above ideas as well.
After that, the significant task is understanding how the RNTN sentiment model inside CoreNLP works. You don't need to grasp the math fully, I suppose, but the basic recursive nature of the algorithm is important to understand. The best resource is of course the original paper (and there's not much else, to be honest).
To train your own sentiment model, you'll need your own sentiment data. Producing this data is no small task. The data for the Stanford sentiment model was crowdsourced, and you may need to do something similar if you want to collect anything near the same scale.
The RNTN sentiment paper (linked above) gives some details on the data format. I'm happy to expand on this further if you do wish to create your own data.
I think you should simply comprehend the concept of supervised learning, unsupervised learning. In addition, some Java knowledge might be useful.

Diagrammatic method to model software components, their interactions & I/O [closed]

I'd like to model software components and their interaction between them, what information is passed, what processes take place in each component(not too detailed) and a clear specification of the input/output of the components.
What i've seen so far in UML is far too abstract and doesn't go into too much detail.
Any suggestions?
Someg guys Design programs on papers as diagrams,
Then pass them to software developer to Contruct.
This appraoach is tried: "Clever guys" do modeling, and pass models to "ordinary" developers to do laborious task. And this not worked.
We like analogies. So many times we make analogy to construction industry where some guys do models-bluprints and other do building-contruction.And we first think that UML or other models diagrams are equivalent to construction industry models-blueprints. But it seems that we are wrong.
To make an analogy with construction industry our blueprints are not
models-diagrams, our blueprints are actually the code we write.
Detailed Paper Models like Cooking Receipes
It is not realistic to design a software system entirely on a paper with detailed models upfront.Software development is iterative and incremental process.
Think of a map maker who make a paper map of city as big as city, since the modeler include every details without any abstraction level.Will it be usefull?
Is Modeling Useless ?
Definitely not. But you should apply it to difficult part of your problem-solution space, not every trival part of them.
So instead of giving every details of system on paper to developers, explore difficult part of problem-solution space with developers face to face using visual diagrams.
In software industry like it or hate it, Source Code is still the
King. And all models are liar until they are implemented and tested

Best Open source / free NLP engine for the job [closed]

Let's say that I have a pull (a list) of well known phrases, like:
{ "I love you", "Your mother is is a ...", "I think I am pregnant" ... } Let's say about a 1000 like these. And now I want the users to enter free text into a text box, and put some kind of NLP engine to digest the text and find the 10 most relevant phrases from the pull that may be related in a way to the text.
I thought that the simplest implementation could be looking by the words. Picking each time one word and looking for similarities in some way. Not sure which?
What most frightens me is the size of a vocabulary that I must support. I am a single developer of some kind of a demo, and I don't like the idea of filling in words into a table...
I am looking for a free NLP engine. I am agnostic about the language it's written in, but it must be free - NOT some kind of an online service that charges by API calls..
It seems that TextBlob and ConeptNet are more than adequate solution to this problem!
TextBlob is an easy-to-use NLP library for Python that is free and open source (licensed under the permissive MIT License). It provides a nice wrapper around the excellent NLTK and pattern libraries.
One simple approach to your problem would be to extract noun phrases from your given text.
Here's an example from the TextBlob docs.
from text.blob import TextBlob
text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.
Snide comparisons to gelatin be damned, it's a concept with the most
devastating of potential consequences, not unlike the grey goo scenario
proposed by technological theorists fearful of
artificial intelligence run rampant.
blob = TextBlob(text)
# => ['titular threat', 'blob', 'ultimate movie monster', ...]
This could be a starting point. From there you could experiment with other methods, such as similarity methods as mentioned in the comments or TF-IDF. TextBlob also makes it easy to swap models for noun phrase extraction.
Full disclosure: I am the author of TextBlob.

Article about code density as a measure of programming language power [closed]

I remember reading an article saying something like
"The number of bugs introduced doesn't vary much with different programming languages, but it depends pretty much on SLOC (source lines of code). So, using the programming language that can implement the same functions with smaller SLOC is preferable in terms of stability."
The author wanted to stress the advantages of using Functional Programming, as normally one can program with a smaller number of LOC. I remember the author cited a research paper about the irrelevance of choice of programming language and the number of bugs.
Is there anyone who knows the research paper or the article?
Paul Graham wrote something very like this in his essay Succinctness is Power. He quotes a report from Ericsson, which may be the paper you remember?
Reports from the field, though they will necessarily be less precise than "scientific" studies, are likely to be more meaningful. For example, Ulf Wiger of Ericsson did a study that concluded that Erlang was 4-10x more succinct than C++, and proportionately faster to develop software in:
Comparisons between Ericsson-internal development projects indicate similar line/hour productivity, including all phases of software development, rather independently of which language (Erlang, PLEX, C, C++, or Java) was used. What differentiates the different languages then becomes source code volume.
I'm not sure if it's the source you're thinking of, but there's something about this in Code Complete chapter 27.3 (p652) - that references "Program Quality and Programmer Productivity" (Jones 1977) and "Estimating Software Costs" (Jones 1998).
I've seen this argument about "succinctness = power" a few times, and I've never really bought it. That's because there are languages (e.g., J, Ursala) which are quite succinct but not (IMO) easy to read because they put so much meaning into individual symbols.
Perhaps the true metric should be the extent to which it is possible to write a particular algorithm both clearly and succinctly. Mind you, I don't know how to measure that.
The book of pragmatic Thinking & Learning points to this article.
Can a Manufacturing Quality Model Work for Software?

Where to study computational geometry? [closed]

I want to solve geometry problems in online programming contests. But whenever I read them, I just find too difficult. Please suggest some books and resources which I can study computational geometry.
A classic work: Computational Geometry in C.
And there's also:
In order to solve basic geometry problems quickly, so that it runs within the time limits of the contest, you need to make certain you have a strong grasp of writing algorithms.
This page has some good suggestions on how to get better. It is set up as a two semester course of reading.
You can try the problem archive on TopCoder.
But you should register first.
On the filter choose:
Category: Geometry
Division II Level: Level One or Level Two.
Almost all problems have description of solutions.
They are pretty simple in comparison you choose random geometric problem from some contest archive.
On the page you can also find a lot of tutorials, including geometric ones.
I recommend two books (among others):
The Algorithm Design Manual By Steven S. Skiena - discusses algorithms in general, but has a lot of useful information about computational geometry
Computational Geometry: Algorithms and Applications
If you want to clear your basics, this is a good starting point - There are some practice problems as well in the article.
You should also read through this article - that covers some advanced concepts.
You must know convex hull and point-in-polygon. Often on TopCoder people create a reusable library for geometry applications, since the same is code is used many times.
Check lbackstrom's tutorial for start. Computional Geometry by de Berg, Cheong, van Kreveld, Overmars [edit: already mentioned by Bart] might be more than you need.
And of course there's Computational Geometry - An Introduction, by Preparata and Shamos. I own it, and recommend it for an introduction to the principles. Not really a dictionary of code, though.
Here are two excellent books, I used them as textbooks at university:
J D Foley, A van Dam et al. Introduction to Computer Graphics. Addison-Wesley, 1994, ISBN 0-201-60921-5.
D Hearn and M P Baker. Computer Graphics with Open GL (3rd edition). Prentice-Hall, 2004, ISBN 0-13-120238-3.
