Algorithm for Negating Sentences - nlp

I was wondering if anyone was familiar with any attempts at algorithmic sentence negation.
For example, given a sentence like "This book is good" provide any number of alternative sentences meaning the opposite like "This book is not good" or even "This book is bad".
Obviously, accomplishing this with a high degree of accuracy would probably be beyond the scope of current NLP, but I'm sure there has been some work on the subject. If anybody knows of any work, care to point me to some papers?

While I'm not aware of any work that specifically looks at automatically generating negated sentences, I imagine a good place to start would be to read up on linguistics work in formal semantics and pragmatics. A good accessible introduction would be Steven C. Levinson's Pragmatics book.
One issue that I think you'll run into is that it can be very difficult to negate all the information that is conveyed by a sentence. For example, take:
John fixed the vase that he broke.
Even if you change this to John did not fix the vase that he broke, there is a presupposition that there is a vase and that John broke it.
Similarly, simply negating the sentence John did not stopped using drugs as John stopped using drugs still conveys that John, at one point, used drugs. A more thorough negation would be John never used drugs.
Some existing natural language processing (NLP) work that you might want to look at is MacCartney and Manning 2007's Natural Logic for Textual Inference. In this paper they use George Lakoff's notion of Natural Logic and Sanchez Valencia's monotonicity calculus to create software that automatically determines whether one sentence entails another. You could probably use some their techniques for detecting non-entailment to artificially construct negated and contradicting sentences.

I'd recommend checking out wordnet. You can use it to lookup antonyms for a word, so you could conceivably replace "bad" with "not good" since bad is an antonym of good. NLTK has a simple python interface to wordnet.

The naïve way of course, is to try to add "not" right after {am,are,is}. I have no idea how this will work in your setting though, it will probably only work with predicate-like sentences.

For simple sentences parse looking for adverbs or adjectives given the English grammar rules and substitute an antonym if only one meaning exists. Otherwise use the correct English negation rule to negate the verb (ie: is -> is not).
High level algorithm:
Look up each word for it's type (noun, verb, adjective, adverb, conjunction, etc...)
Infer sentence structure from word type sequences (Your sentence was: article, noun, verb, adjective/adverb; This is known to be a simple sentence.)
For simple sentences, choose one invertible word and invert it. Either by using an antonym, or negating the verb.
For more complex sentences, such as those with subordinate clauses, you will need to have more complex analysis, but for simple sentences, this shouldn't be infeasible.

There's a similar process for first-order logic. The usual algorithm is to map P to not P, and then perform valid translations to move the not somewhere convenient, e.g.:
Original: (not R(x) => exists(y) (O(y) and P(x, y)))
Negate it: not (not R(x) => exists(y) (O(y) and P(x, y)))
Rearrange: not (R(x) or exists(y) (O(y) and P(x, y)))
not R(x) and not exists(y) (O(y) and P(x, y))
not R(x) and forall(y) not (O(y) and P(x, y))
not R(x) and forall(y) (not O(y) or not P(x, y))
Performing the same on English you'd be negating "If it's not raining here, then there is some activity that is an outdoors activity and can be performed here" to "It is NOT the case that ..." and finally into "It's not raining and every possible activity is either not for outdoors or can't be performed here."
Natural language is a lot more complicated than first-order logic, of course... but if you can parse the sentence into something where the words "not", "and", "or", "exists" etc. can be identified, then you should be able to perform similar translations.

For a rule-based negation approach, you can take a look at the Python module negate1.
1 Disclaimer: I am the author of the module.
As for some papers related to the topic, you can take a look at:
Understanding by Understanding Not: Modeling Negation in Language Models
An Analysis of Natural Language Inference Benchmarks through the Lens of Negation
Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation

Nice demos using NTLK - http://text-processing.com/demo and a short writeup - http://text-processing.com/demo/sentiment/.

Related

What's the correct implementation of "bag of n-grams"?

I'm reading François Chollet book "Deep Learning with Python", and in page 204 it suggests that the phrase The cat sat on the mat. would originate the following 2-grams:
{"The", "The cat", "cat", "cat sat", "sat",
"sat on", "on", "on the", "the", "the mat", "mat"}
Source:
However, every implementation of n-grams that I have saw (nltk, tensorflow), encodes the same phrase like this following:
[('The', 'cat'), ('cat', 'sat'), ('sat', 'on'), ('on', 'the'), ('the', 'mat.')]
Am I missing some detail? (I'm new to natural language processing, so that might be the case)
Or it's the book implementation wrong/outdated?
I want to slightly expand on the other answer given, specifically to the "clearly wrong". While I agree that it is not the standard approach (to my knowledge!), there is an important definition in the mentioned book, just before the shown excerpt, which states:
Word n-grams are groups of N (or fewer) consecutive words that you can extract froma sentence. The same concept may also be applied to characters instead of words
(bold highlight by me). It seems that Chollet defines n-grams slightly different from the common interpretation (namely, that a n-gram has to consist of exactly n words/chars etc.). With that, the subsequent example is entirely within the
defined circumstances, although you likely will find varying implementations of this in the real world.
One example aside from the mentioned Tensorflow/NLTK implementation would be scikit-learn's TfidfVectorizer, which has the parameter ngram_range. This is basically something in between Chollet's definition and a strict interpretation, where you can select an arbitrary minimum/maximum amount of "grams" for a single unit, which are then built similar to the above example where a single bag can have both unigrams and bigrams, for example.
Book implementation is incorrect. It is mixing unigrams (1-grams) with bigrams (2-grams).

Chomsky hierarchy - examples with real languages

I'm trying to understand the four levels of the Chomsky hierarchy by using some real languages as models. He thought that all the natural languages can be generated through a Context-free Grammar, but Schieber contradicted this theory proving that languages such as Swiss German can only be generated through Context-sensitive grammar. Since Chomsky is from US, I guess that the American language is an example of Context-free grammar. My questions are:
Are there languages which can be generated by regular grammars (type 3)?
Since the Recursively enumerable grammars can generate all languages, why not using that? Are they too complicated and less linear?
What the characteristic of Swiss German which make it impossible to be generated through Context-free grammars?
I don't think this is an appropriate question for StackOverflow, which is a site for programming questions. But I'll try to address it as best I can.
I don't believe Chomsky was ever under the impression that natural languages could be described with a Type 2 grammar. It is not impossible for noun-verb agreement (singular/plural) to be represented in a Type 2 grammar, because the number of cases is finite, but the grammar is awkward. But there are more complicated features of natural language, generally involving specific rules about how word order can be rearranged, which cannot be captured in a simple grammar. It was Chomsky's hope that a second level of analysis -- "transformational grammars" -- could useful capture these rearrangement rules without making the grammar computationally intractable. That would require finding some systematization which fit between Type 1 and Type 2, because Type 1 grammars are not computationally tractable.
Since we do, in fact, correctly parse our own languages, it stands to reason that there be some computational algorithm. But that line of reasoning might not actually be correct, because there is a limit to the complexity of a sentence which we can parse. Any finite language is regular (Type 3); only languages which have an unlimited number of potential sentences require more sophisticated grammars. So a large collection of finite patterns could suffice to understand natural language. These patterns might be a lot more sophisticated than regular expressions, but as long as each pattern only applies to a sentence of limited length, the pattern could be expressed mathematically as a regular expression. (The most obvious one is to just list all possible sentences as alternatives, which is a regular expression if the number of possible sentences is finite. But in many cases, that might be simplified into something more useful.)
As I understand it, modern attempts to deal with natural language using so-called "deep learning" are essentially based on pattern recognition through neural networks, although I haven't studied the field deeply and I'm sure that there are many complications I'm skipping over in that simple description.
Noam Chomsky is an American, but "American" is not a language (y si fuera, podría ser castellano, hablado por la mayoría de los residentes de las Americas). As far as I know, his first language is English, but he is not by any means unilingual, although I don't know how much Swiss German he speaks. Certainly, there have been criticisms over the years that his theories have an Indo-European bias. Certainly, I don't claim competence in Swiss German, despite having lived several years in Switzerland, but I did read Shieber's paper and some of the follow-ups and discussed them with colleagues who were native Swiss German speakers. (Opinions were divided.)
The basic issue has to do with morphological agreement in lists. As I mentioned earlier, many languages (all Indo-European languages, as far as I know) insist that the form of the verb agrees with the form of the subject, so that a singular subject requires a singular verb and a plural subject requires a plural verb. [Note 1]
In many languages, agreement is also required between adjectives and nouns, and this is not just agreement in number but also agreement in grammatical gender (if applicable). Also, many languages require agreement between the specific verb and the article or adjective of the object of the verb. [Note 2]
Simple agreement can be handled by a context-free (Type 2) grammar, but there is a huge restriction. To put it simply, a context-free grammar can only deal with parenthetic constructions. This can work even if there is more than one type of parenthesis, so a context-free grammar can insist that an [ be matched with a ] and not a ). But the grammar must have this "inside-out" form: the matching symbols must be in the reverse order to the symbols being matched.
One consequence of this is that there is a context-free grammar for palindromes -- sentences which read the same in both directions, which effectively means that they consist of a phrase followed by its reverse. But there is no context-free grammar for duplications: a language consisting of repeated phrases. In the palindrome, the matching words are in the reverse order to the matched words; in the duplicate, they are in the same order. Hence the difference.
Agreement in natural languages mostly follows this pattern, and some of the exceptions can be dealt with by positing simple rules for reordering finite numbers of phrases -- Chomsky's transformational grammar. But Swiss German features at least one case where agreement is not parenthetic, but rather in the same order. [Note 3] This involves the feature of German in which many sentences are in the order Subject-Object-Verb, which can be extended to Subject Object Object Object... Verb Verb Verb... when the verbs have indirect objects. Shieber showed some examples in which object-verb agreement is ordered, even when there are intervening phrases.
In the general case, such "cross-serial agreement" cannot be expressed in a context-free grammar. But there is a huge underlying assumption: that the length of the agreeing series be effectively unlimited. If, on the other hand, there are a finite number of patterns actually in common use, the "deep learning" model referred to above would certainly be able to handle it.
(I want to say that I'm not endorsing deep learning here. In fact, the way "artificial intelligence" is "trained" involves the uses of trainers whose cultural biases may well not be sufficiently understood. This could easily lead to the same unfortunate consequences alluded to in my first footnode.)
Notes
This is not the case in many native American languages, as Whorf pointed out. In those languages, using a singular verb with a plural noun implies that the action was taken collectively, while using a plural verb would imply that the action was taken separately. Roughly transcribed to English, "The dogs run" would be about a bunch of dogs independently running in different directions, whereas "The dogs runs" would be about a single pack of dogs all running together. Some European "teachers" who imposed their own linguistic prejudices on native languages failed to correctly understand this distinction, and concluded that the native Americans must be too primitive to even speak their own language "correctly"; to "correct" this "deficiency", they attempted to eliminate the distinction from the language, in some cases with success.
These rules, not present in English, are one of the reasons some English speakers are tortured by learning German. I speak from personal experience.
Ordered agreement, as opposed to parenthetic agreement, is known as cross-serial dependency.

Extracting <subject, predicate, object> triplet from unstructured text

I need to extract simple triplets from unstructured text. Usually it is of the form noun- verb- noun, so I have tried POS tagging and then extracting nouns and verbs from neighbourhood.
However it leads to lot of cases and gives low accuracy.
Will Syntactic/semantic parsing help in this scenario?
Will ontology based information extraction be more useful?
I expect that syntactic parsing would be the best fit for your scenario. Some trivial template-matching method with POS tags might work, where you find verbs preceded and followed by a single noun, and take the former to be the subject and the latter the object. However, it sounds like you've already tried something like that -- unless your neighborhood extraction ignores word order (which would be a bit silly - you'd be guessing which noun was the word and which was the object, and that's assuming exactly two nouns in each sentence).
Since you're looking for {s, v, o} triplets, chances are you won't need semantic or ontological information. That would be useful if you wanted more information, e.g. agent-patient relations or deeper knowledge extraction.
{s,v,o} is shallow syntactic information, and given that syntactic parsing is considerably more robust and accessible than semantic parsing, that might be your best bet. Syntactic parsing will be sensitive to simple word re-orderings, e.g. "The hamburger was eaten by John." => {John, eat, hamburger}; you'd also be able to specifically handle intransitive and ditransitive verbs, which might be issues for a more naive approach.

Infinitive form disambiguation

How to decide whether in a sentence a word is infinitive or not?
For example here "fixing" is infinitive:
Fixing the door was also easy but fixing the window was very hard.
But in
I am fixing the door
it is not. How do people disambiguate these cases?
To elaborate on my comment:
In PoS tagging, choosing between a gerund (VBG) and a noun (NN) is quite subtle and has many special cases. My understanding is fixing should be tagged as a gerund in your first sentence, because it can be modified by an adverb in that context. Citing from the Penn PoS tagging guidelines (page 19):
"While both nouns and gerunds can be preceded by an article or a possessive pronoun, only a noun (NN) can be modified by an adjective, and only a gerund (VBG) can be modified by an adverb."
EXAMPLES:
Good/JJ cooking/NN is something to enjoy.
Cooking/VBG well/RB is a useful skill.
Assuming you meant 'automatically disambiguate', this task requires a bit of processing (pos-tagging and syntactic parsing). The idea is to find instances of a verb that are not preceded by an agreeing Subject Noun Phrase. If you also want to catch infinitive forms like "to fix", just add that to the list of forms you are looking for.

Mathematica: what is symbolic programming?

I am a big fan of Stephen Wolfram, but he is definitely one not shy of tooting his own horn. In many references, he extols Mathematica as a different symbolic programming paradigm. I am not a Mathematica user.
My questions are: what is this symbolic programming? And how does it compare to functional languages (such as Haskell)?
When I hear the phrase "symbolic programming", LISP, Prolog and (yes) Mathematica immediately leap to mind. I would characterize a symbolic programming environment as one in which the expressions used to represent program text also happen to be the primary data structure. As a result, it becomes very easy to build abstractions upon abstractions since data can easily be transformed into code and vice versa.
Mathematica exploits this capability heavily. Even more heavily than LISP and Prolog (IMHO).
As an example of symbolic programming, consider the following sequence of events. I have a CSV file that looks like this:
r,1,2
g,3,4
I read that file in:
Import["somefile.csv"]
--> {{r,1,2},{g,3,4}}
Is the result data or code? It is both. It is the data that results from reading the file, but it also happens to be the expression that will construct that data. As code goes, however, this expression is inert since the result of evaluating it is simply itself.
So now I apply a transformation to the result:
% /. {c_, x_, y_} :> {c, Disk[{x, y}]}
--> {{r,Disk[{1,2}]},{g,Disk[{3,4}]}}
Without dwelling on the details, all that has happened is that Disk[{...}] has been wrapped around the last two numbers from each input line. The result is still data/code, but still inert. Another transformation:
% /. {"r" -> Red, "g" -> Green}
--> {{Red,Disk[{1,2}]},{Green,Disk[{3,4}]}}
Yes, still inert. However, by a remarkable coincidence this last result just happens to be a list of valid directives in Mathematica's built-in domain-specific language for graphics. One last transformation, and things start to happen:
% /. x_ :> Graphics[x]
--> Graphics[{{Red,Disk[{1,2}]},{Green,Disk[{3,4}]}}]
Actually, you would not see that last result. In an epic display of syntactic sugar, Mathematica would show this picture of red and green circles:
But the fun doesn't stop there. Underneath all that syntactic sugar we still have a symbolic expression. I can apply another transformation rule:
% /. Red -> Black
Presto! The red circle became black.
It is this kind of "symbol pushing" that characterizes symbolic programming. A great majority of Mathematica programming is of this nature.
Functional vs. Symbolic
I won't address the differences between symbolic and functional programming in detail, but I will contribute a few remarks.
One could view symbolic programming as an answer to the question: "What would happen if I tried to model everything using only expression transformations?" Functional programming, by contrast, can been seen as an answer to: "What would happen if I tried to model everything using only functions?" Just like symbolic programming, functional programming makes it easy to quickly build up layers of abstractions. The example I gave here could be easily be reproduced in, say, Haskell using a functional reactive animation approach. Functional programming is all about function composition, higher level functions, combinators -- all the nifty things that you can do with functions.
Mathematica is clearly optimized for symbolic programming. It is possible to write code in functional style, but the functional features in Mathematica are really just a thin veneer over transformations (and a leaky abstraction at that, see the footnote below).
Haskell is clearly optimized for functional programming. It is possible to write code in symbolic style, but I would quibble that the syntactic representation of programs and data are quite distinct, making the experience suboptimal.
Concluding Remarks
In conclusion, I advocate that there is a distinction between functional programming (as epitomized by Haskell) and symbolic programming (as epitomized by Mathematica). I think that if one studies both, then one will learn substantially more than studying just one -- the ultimate test of distinctness.
Leaky Functional Abstraction in Mathematica?
Yup, leaky. Try this, for example:
f[x_] := g[Function[a, x]];
g[fn_] := Module[{h}, h[a_] := fn[a]; h[0]];
f[999]
Duly reported to, and acknowledged by, WRI. The response: avoid the use of Function[var, body] (Function[body] is okay).
You can think of Mathematica's symbolic programming as a search-and-replace system where you program by specifying search-and-replace rules.
For instance you could specify the following rule
area := Pi*radius^2;
Next time you use area, it'll be replaced with Pi*radius^2. Now, suppose you define new rule
radius:=5
Now, whenever you use radius, it'll get rewritten into 5. If you evaluate area it'll get rewritten into Pi*radius^2 which triggers rewriting rule for radius and you'll get Pi*5^2 as an intermediate result. This new form will trigger a built-in rewriting rule for ^ operation so the expression will get further rewritten into Pi*25. At this point rewriting stops because there are no applicable rules.
You can emulate functional programming by using your replacement rules as function. For instance, if you want to define a function that adds, you could do
add[a_,b_]:=a+b
Now add[x,y] gets rewritten into x+y. If you want add to only apply for numeric a,b, you could instead do
add[a_?NumericQ, b_?NumericQ] := a + b
Now, add[2,3] gets rewritten into 2+3 using your rule and then into 5 using built-in rule for +, whereas add[test1,test2] remains unchanged.
Here's an example of an interactive replacement rule
a := ChoiceDialog["Pick one", {1, 2, 3, 4}]
a+1
Here, a gets replaced with ChoiceDialog, which then gets replaced with the number the user chose on the dialog that popped up, which makes both quantities numeric and triggers replacement rule for +. Here, ChoiceDialog as a built-in replacement rule along the lines of "replace ChoiceDialog[some stuff] with the value of button the user clicked".
Rules can be defined using conditions which themselves need to go through rule-rewriting in order to produce True or False. For instance suppose you invented a new equation solving method, but you think it only works when the final result of your method is positive. You could do the following rule
solve[x + 5 == b_] := (result = b - 5; result /; result > 0)
Here, solve[x+5==20] gets replaced with 15, but solve[x + 5 == -20] is unchanged because there's no rule that applies. The condition that prevents this rule from applying is /;result>0. Evaluator essentially looks the potential output of rule application to decide whether to go ahead with it.
Mathematica's evaluator greedily rewrites every pattern with one of the rules that apply for that symbol. Sometimes you want to have finer control, and in such case you could define your own rules and apply them manually like this
myrules={area->Pi radius^2,radius->5}
area//.myrules
This will apply rules defined in myrules until result stops changing. This is pretty similar to the default evaluator, but now you could have several sets of rules and apply them selectively. A more advanced example shows how to make a Prolog-like evaluator that searches over sequences of rule applications.
One drawback of current Mathematica version comes up when you need to use Mathematica's default evaluator (to make use of Integrate, Solve, etc) and want to change default sequence of evaluation. That is possible but complicated, and I like to think that some future implementation of symbolic programming will have a more elegant way of controlling evaluation sequence
As others here already mentioned, Mathematica does a lot of term rewriting. Maybe Haskell isn't the best comparison though, but Pure is a nice functional term-rewriting language (that should feel familiar to people with a Haskell background). Maybe reading their Wiki page on term rewriting will clear up a few things for you:
http://code.google.com/p/pure-lang/wiki/Rewriting
Mathematica is using term rewriting heavily. The language provides special syntax for various forms of rewriting, special support for rules and strategies. The paradigm is not that "new" and of course it's not unique, but they're definitely on a bleeding edge of this "symbolic programming" thing, alongside with the other strong players such as Axiom.
As for comparison to Haskell, well, you could do rewriting there, with a bit of help from scrap your boilerplate library, but it's not nearly as easy as in a dynamically typed Mathematica.
Symbolic shouldn't be contrasted with functional, it should be contrasted with numerical programming. Consider as an example MatLab vs Mathematica. Suppose I want the characteristic polynomial of a matrix. If I wanted to do that in Mathematica, I could do get an identity matrix (I) and the matrix (A) itself into Mathematica, then do this:
Det[A-lambda*I]
And I would get the characteristic polynomial (never mind that there's probably a characteristic polynomial function), on the other hand, if I was in MatLab I couldn't do it with base MatLab because base MatLab (never mind that there's probably a characteristic polynomial function) is only good at calculating finite-precision numbers, not things where there are random lambdas (our symbol) in there. What you'd have to do is buy the add-on Symbolab, and then define lambda as its own line of code and then write this out (wherein it would convert your A matrix to a matrix of rational numbers rather than finite precision decimals), and while the performance difference would probably be unnoticeable for a small case like this, it would probably do it much slower than Mathematica in terms of relative speed.
So that's the difference, symbolic languages are interested in doing calculations with perfect accuracy (often using rational numbers as opposed to numerical) and numerical programming languages on the other hand are very good at the vast majority of calculations you would need to do and they tend to be faster at the numerical operations they're meant for (MatLab is nearly unmatched in this regard for higher level languages - excluding C++, etc) and a piss poor at symbolic operations.

Resources