Are there published generative grammars for natural languages?

Are there published generative grammars for natural languages? - nlp

I have some ideas to do with natural language processing. I will need some grammars of the
S -> NP VP
variety in order to play with them.
If I try to write these rules myself it will be a tedious and error-prone business. Has anyone ever typed up and released comprehensive rule sets for English and other natural languages? Ideally written in BNF, Prolog or similar syntax.
My project only relates to context-free grammars, I'm not interested in statistical methods or machine learning -- I need to systematically produce Engligh-like and Foobarian-like sentences.
If you know where to find such materiel, I'd very much appreciate it.

You might want to look at Attempto Controlled English and its Prolog-based tools.
Since statistical parsing came in vogue in the early 90s, grammars have usually not been distributed, except for specific problem domains, but derived from distributed corpora such as the Penn Treebank. If you can get a hold of that (I believe a sample is distributed with NLTK), you can "roll your own" grammar by looking at all tree fragments and translating them to rules. (E.g., if you find a node labeled S with children labeled NP and VP, you know there should be a rule S -> NP VP. Pruning the rules that occur infrequently would be a good idea.)

The most comprehensive context-free grammar for English that I know of is the one described in:
Gazdar, Gerald; Ewan H. Klein, Geoffrey K. Pullum, Ivan A. Sag. 1985. Generalized Phrase Structure Grammar. Oxford: Blackwell.
There are also several rule-based but non-context-free grammars freely available online, e.g., the Penn XTAG grammar or the HPSG English Resource Grammar.

Look into the Grammatical Framework. It is a functional programming language for multilingual grammar applications which comes with libraries for ~30 languages, among them English.

Related

Anyone know of any real systems using Computational Semantics with Lambda Calculus?

I was wondering if Computational Semantics is actually used in any real-world system? (Simple examples here and here). I would like to see how an actual system works.
It seems like there are a bunch of issues with actually using Computational Semantics in any real world system:
It seems just labeling sentences with part-of-speech tags is error prone.
But you also need a reliable parse tree which is error prone and there can be many valid trees for one sentence.
Finding what pronouns are referring to what entities is error prone.
Word disambiguation is also another source of errors and multiple meanings could be valid in the same context.
Any context-free-grammar of English I can find seems to be incomplete.
Finally, after all these sources of error are dodged, we can finally convert the sentence to FOL with Computation Semantics!
Also, I can't seem to figure out how to deal with prepositions in Computation Semantics.
Is this really just an academic exercise or is Computational Semantics actually useful?

There are several better aproaches to natural language than simple lambda calculus and context free grammars, ie. HPSG, Montague Grammar, TAG, ...
Word disambiguation can be handled by Markov chains, for example.
Siri, Google Now, Cortana and IBM Watson are some examples for real world systems.
Google Translate is another application that uses Computational Semantics.

I believe (bu't don't quote me on this) that the technology spun out of the now-defunct Natural Language Theory and Technology group at the Palo Alto Research Center (PARC, formerly Xerox PARC), utilizes the lambda calculus to provide inferences about textual entailments. idk i only worked there a summer (freshman, so was wonderfully igorant of most of the goings-on there).
Anyway, that 'technology' was developed over 30 years and then Powerset bought the right to all of it for $15 million, attempting to disrupt smart search in general. Then Bing's fatass came along, gobbled it up nom nom nom, then continued devouring the entire research group as whole. The principal core investigators now work solely as adjuct profs at Stanford. Sad.

flow charts for languages

I was looking at some syntax diagrams for SQLite and was wondering if they could be used to describe all languages (like Python, C++, etc.)?
http://www.sqlite.org/lang_createtable.html
From some CS classes I took years ago I remember groups of languages the could be described by DFA and what not, but don't remember many details and think this is probably different anyways.
Any clarity would be appreciated.

You wouldn't usually call them "flow charts", but "syntax diagrams" (as you did) or "railroad diagrams". See the Wikipedia article for details, and feel free to use my Railroad Diagram Generator for generating them from an EBNF grammar.
A DFA corresponds to regular grammars, whereas EBNF and syntax diagrams describe context free grammars. These are different levels of the Chomsky hierarchy, which is the basic framework for classifying formal grammars.

NLP: Language Analysis Techniques and Algorithms

Situation:
I wish to perform a Deep-level Analysis of a given text, which would mean:
Ability to extract keywords and assign importance levels based on contextual usage.
Ability to draw conclusions on the mood expressed.
Ability to hint on the education level (word does this a little bit though, but something more automated)
Ability to mix-and match phrases and find out certain communication patterns
Ability to draw substantial meaning out of it, so that it can be quantified and can be processed for answering by a machine.
Question:
What kind of algorithms and techniques need to be employed for this?
Is there a software that can help me in doing this?

When you figure out how to do this please contact DARPA, the CIA, the FBI, and all other U.S. intelligence agencies. Contracts for projects like these are items of current research worth many millions in research grants. ;)
That being said you'll need to process it in layers and analyze at each of those layers. For items 2 and 3 you'll find training an SVM on n-tuples (try, 3) words will help. For 1 and 4 you'll want deeper analysis. Use a tool like NLTK, or one of the many other parsers and find the subject words in sentences and related words. Also use WordNet (from Princeton)
to find the most common senses used and take those as key words.
5 is extremely challenging, I think intelligent use of the data above can give you what you want, but you'll need to use all your grammatical knowledge and programming knowledge, and it will still be very rough grained.

It sounds like you might be open to some experimentation, in which case a toolkit approach might be best? If so, look at the NLTK Natural Language Toolkit for Python. Open source under the Apache license, and there are a couple of excellent books about it (including one from O'Reilly which is also released online under a creative commons license).

Programmatic parsing and understanding of language (English)

I am looking for some resources pertaining to the parsing and understanding of English (or just human language in general). While this is obviously a fairly complicated and wide field of study, I was wondering if anyone had any book or internet recommendations for study of the subject. I am aware of the basics, such as searching for copulas to draw word relationships, but anything you guys recommend I will be sure to thoroughly read.
Thanks.

Check out WordNet.

You probably want a book like "Representation and Inference for Natural Language - A First Course in Computational Semantics"
http://homepages.inf.ed.ac.uk/jbos/comsem/book1.html

Another way is looking at existing tools that already do the job on the basis of research papers: http://nlp.stanford.edu/index.shtml
I've used this tool once, and it's very nice. There's even an online version that lets you parse English and draws dependency trees and so on.
So you can start taking a look at their papers or the code itself.
Anyway take in consideration that in any field, what you get from such generic tools is almost always not what you want. In the sense that the semantics attributed by such tools is not what you would expect. For most cases, given a specific constrained domain it's preferable to roll your own parser, and do your best to avoid any ambiguities beforehand.

The process that you describe is called natural language understanding. There are various algorithms and software tools that have been developed for this purpose.

What are good starting points for someone interested in natural language processing? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Question
So I've recently came up with some new possible projects that would have to deal with deriving 'meaning' from text submitted and generated by users.
Natural language processing is the field that deals with these kinds of issues, and after some initial research I found the OpenNLP Hub and university collaborations like the attempto project. And stackoverflow has this.
If anyone could link me to some good resources, from reseach papers and introductionary texts to apis, I'd be happier than a 6 year-old kid opening his christmas presents!
Update
Through one of your recommendations I've found opencyc ('the world's largest and most complete general knowledge base and commonsense reasoning engine'). Even more amazing still, there's a project that is a distilled version of opencyc called UMBEL. It features semantic data in rdf/owl/skos n3 syntax.
I've also stumbled upon antlr, a parser generator for 'constructing recognizers, interpreters, compilers, and translators from grammatical descriptions'.
And there's a question on here by me, that lists tons of free and open data.
Thanks stackoverflow community!

Tough call, NLP is a much wider field than most people think it is. Basically, language can be split up into several categories, which will require you to learn totally different things.
Before I start, let me tell you that I doubt you'll have any notable success (as a professional, at least) without having a degree in some (closely related) field. There is a lot of theory involved, most of it is dry stuff and hard to learn. You'll need a lot of endurance and most of all: time.
If you're interested in the meaning of text, well, that's the Next Big Thing. Semantic search engines are predicted as initiating Web 3.0, but we're far from 'there' yet. Extracting logic from a text is dependant on several steps:
Tokenization, Chunking
Disambiguation on a lexical level (Time flies like an arrow, but fruit flies like a banana.)
Syntactic Parsing
Morphological analysis (tense, aspect, case, number, whatnot)
A small list, off the top of my head. There's more :-), and many more details to each point. For example, when I say "parsing", what is this? There are many different parsing algorithms, and there are just as many parsing formalisms. Among the most powerful are Tree-adjoining grammar and Head-driven phrase structure grammar. But both of them are hardly used in the field (for now). Usually, you'll be dealing with some half-baked generative approach, and will have to conduct morphological analysis yourself.
Going from there to semantics is a big step. A Syntax/Semantics interface is dependant both, on the syntactic and semantic framework employed, and there is no single working solution yet. On the semantic side, there's classic generative semantics, then there is Discourse Representation Theory, dynamic semantics, and many more. Even the logical formalism everything is based on is still not well-defined. Some say one should use first-order logic, but that hardly seems sufficient; then there is intensional logic, as used by Montague, but that seems overly complex, and computationally unfeasible. There also is dynamic logic (Groenendijk and Stokhof have pioneered this stuff. Great stuff!) and very recently, this summer actually, Jeroen Groenendijk presented a new formalism, Inquisitive Semantics, also very interesting.
If you want to get started on a very simple level, read Blackburn and Bos (2005), it's great stuff, and the de-facto introduction to Computational Semantics! I recently extended their system to cover the partition-theory of questions (question answering is a beast!), as proposed by Groenendijk and Stokhof (1982), but unfortunately, the theory has a complexity of O(n²) over the domain of individuals. While doing so, I found B&B's implementation to be a bit, erhm… hackish, at places. Still, it is going to really, really help you dive into computational semantics, and it is still a very impressive showcase of what can be done. Also, they deserve extra cool-points for implementing a grammar that is settled in Pulp Fiction (the movie).
And while I'm at it, pick up Prolog. A lot of research in computational semantics is based on Prolog. Learn Prolog Now! is a good intro. I can also recommend "The Art of Prolog" and Covington's "Prolog Programming in Depth" and "Natural Language Processing for Prolog Programmers", the former of which is available for free online.

Chomsky is totally the wrong source to look to for NLP (and he'd say as much himself, emphatically)--see: "Statistical Methods and Linguistics" by Abney.
Jurafsky and Martin, mentioned above, is a standard reference, but I myself prefer Manning and Schütze. If you're serious about NLP you'll probably want to read both. There are videos of one of Manning's courses available online.

If you get through Prolog until the DCG chapter in Learn Prolog Now! mentioned by Mr. Dimitrov above, you'll have a good beginning at getting some semantics into your system, since Prolog gives you a very simple way of maintaining a database of knowledge and belief, which can be updated through question-answering.
As regards the literature, I have one major recommendation for you: run out and buy Speech and Language Processing by Jurafsky & Martin. It is pretty much the book on NLP (the first chapter is available online); used in a frillion university courses but also very readable for the non-linguist and practically oriented, while at the same time going fairly deep into the linguistics problems. I really cannot recommend it enough. Chapters 17, 18 and 21 seem to be what you're looking for (14, 15 and 18 in the first edition); they show you simple lambda notation which translates pretty well to Prolog DCG's with features.
Oh, btw, on getting the masters in linguistics; if NL semantics is what you're into, I'd rather recommend taking all the AI-related courses you can find (although any courses on "plain" linguistic semantics, logic, logical semantics, DRT, LFG/HPSG/CCG, NL parsing, formal linguistic theory, etc. wouldn't hurt...)
Reading Chomsky's original literature is not really useful; as far as I know there are no current implementations that directly correspond to his theories, all the useful stuff of his is pretty much subsumed by other theories (and anyone who stays near linguists for any matter of time will absorb knowledge of Chomsky by osmosis).

I'd highly recommend playing around with the NLTK and reading the NLTK Book. The NLTK is very powerful and easy to get into.

You could try reading up a bit on phrase structured grammers, which is basically the mathematics behind much language processessing. It's actually not that heavy, being largely based on set and graph theory. I studied it many moons ago as part of a discrete math course, and I guess there are many good references available at this stage.
Edit:Not as much as I expected on google, although this one looks like a good learning source.

One of the early explorers into NLP is Noam Chomsky; he wrote small books on the subject in the 50s through the 70s. You may find that engaging reading.

Cycorp have a short description of how their Cyc knowledge base derives meaning from sentences.
By utilising a massive knowledge base of common facts, the system can determine the most logical parse of a sentence.

A simpler place to begin with the building blocks is the look at the documentation for a package that attempts to do it. I'd recommend the Python [Natural Language Toolkit (NLTK)1, particularly because of their well-written, free book, which is filled with examples. It won't get you all the way to what you want (which is an AI-hard problem), but it will give you a good footing. NLTK has parsers, chunkers, context-free grammars, and more.

This is really hard stuff. I'd start off by getting at least a Masters in Linguistics, and then work towards my PhD in computer science, concentrating on NLP.
The problem is that most of us don't have the understanding of what language is. And without that understanding, it's bloody tough to implement a solution.
Other comments give some readings, which are probably fine if you want to get started playing around with a small subset of the problem, but in order to come up with a really robust solution, then there are no shortcuts. You need the academic background in both disciplines.

A very enjoyable readable introduction is The Language Instinct by Steven Pinker. It goes into the Chomsky stuff and also tells interesting stories from the evolutionary biology angle. Might be worth starting with something like that before diving into Chomsky's papers and related work, if you're new to the subject.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string