Logical fallacy detection and/or identification with natural-language-processing - nlp

Is there a package or methodology in existence for the detection of flawed logical arguments in text?
I was hoping for something that would work for text that is not written in an academic setting (such as a logic class). It might be a stretch but I would like something that can identify where logic is trying to be used and identify the logical error. A possible use for this would be marking errors in editorial articles.
I don't need anything that is polished. I wouldn't mind working to develop something either so I'm really looking for what's out there in the wild now.

That's a difficult problem, because you'll have to map natural language to some logical representation, and deal with ambiguity in the process.
Attempto Project may be interesting for you. It has several tools that you can try online. In particular, RACE may be doing something you wanted to do. It checks for consistency on the given assertions. But the bigger issue here is in transforming them to logical forms.

For an onology of logical axioms, OpenCyc and the commercial full Cyc ontologies might be worth investigating as well. CycML is used as a language to model the logical assertions, and the Cyc engine is capable of logical inference. The source for OpenCyc can be found in the OpenCyc SourceForge project. The Cyc Wikipedia page also has great information.

Yes, this is a very nasty problem. I would suggest you try to focus in on a narrow domain. For example, if you are looking for logic errors in cancer determination, you have to focus on which type of cancer as well as what are you trying to resolve eg: correct treatment plans, correct observations, correct procedures, correct stage determination, etc. Then you have to find the taxonomy or ontology for that specific cancer, eg: Medline. So for example, you will likely have to focus in on ONLY lung cancer and then only a subset of lung cancer types and only observations indicating lung cancer. Then you will have identify your corpus, knowledge trees, entity relationships and then worry about negation detection, hypotheticals and subject detection. If Healthcare doesn float your boat, I hear another challenging domain for logic errors is the legal/law industry.

Related

Anyone know of any real systems using Computational Semantics with Lambda Calculus?

I was wondering if Computational Semantics is actually used in any real-world system? (Simple examples here and here). I would like to see how an actual system works.
It seems like there are a bunch of issues with actually using Computational Semantics in any real world system:
It seems just labeling sentences with part-of-speech tags is error prone.
But you also need a reliable parse tree which is error prone and there can be many valid trees for one sentence.
Finding what pronouns are referring to what entities is error prone.
Word disambiguation is also another source of errors and multiple meanings could be valid in the same context.
Any context-free-grammar of English I can find seems to be incomplete.
Finally, after all these sources of error are dodged, we can finally convert the sentence to FOL with Computation Semantics!
Also, I can't seem to figure out how to deal with prepositions in Computation Semantics.
Is this really just an academic exercise or is Computational Semantics actually useful?
There are several better aproaches to natural language than simple lambda calculus and context free grammars, ie. HPSG, Montague Grammar, TAG, ...
Word disambiguation can be handled by Markov chains, for example.
Siri, Google Now, Cortana and IBM Watson are some examples for real world systems.
Google Translate is another application that uses Computational Semantics.
I believe (bu't don't quote me on this) that the technology spun out of the now-defunct Natural Language Theory and Technology group at the Palo Alto Research Center (PARC, formerly Xerox PARC), utilizes the lambda calculus to provide inferences about textual entailments. idk i only worked there a summer (freshman, so was wonderfully igorant of most of the goings-on there).
Anyway, that 'technology' was developed over 30 years and then Powerset bought the right to all of it for $15 million, attempting to disrupt smart search in general. Then Bing's fatass came along, gobbled it up nom nom nom, then continued devouring the entire research group as whole. The principal core investigators now work solely as adjuct profs at Stanford. Sad.

How to implement a "Generalisation" in SCL

Is it possible for a generalisation in UML to be implemented in Simatic SCL code (or Structured text code)?
The definition of a Generalisation in UML:
A generalisation is a relationship between a morew general classifier and a
more specific classifier. Each Instance of the specific classifier is also an
indirect instance of the general clasifier. Thus, the specific classifier
inherits the features of the more general classifier.
Features specified for instances of the general classifier are implicitly
specified for instances of the specific classifier. Any constraint applying
to instances of the general classifier also applies to instances of the
specific classifier.
In general the answer to this is no, not really. All means of programming PLCs (ladder, ST, FBD, etc) are generally only very lightly abstracted from the actual machine code. They are closer to assembly wrappers than to anything we would think of as a modern development language. Structured Text is closer to very primitive Pascal - it lacks most any sort of object oriented features.
The notion is that PLCs and PLC programmers have long since been used to an approach of extreme micromanagement when it comes to developing programs for them. The reasons for this are many - some more valid than others. Scott Whitlock wrote a good bit here outlining some of those reasons. A big one is that maintenance guys on the factory floor are often the ones trying to troubleshoot the machines and having clear, non-abstract, state-machine information available to them is much more valuable than the need for an elegant, minimal formulation to stroke the ego of the system developer.
PLC programming is a ruthlessly practical industry. If you have the choice between something 10% more practical and something 90% more elegant, the practical solution will always win.
With that said - there are some who are playing in this area. I suggest a quick read of this article for some examples of trying to make ST work a bit like you are suggesting. Still, I would be cautious before putting anything like this to work in a real factory with real machines that need to be both safe and reliably making money.

NLP: Language Analysis Techniques and Algorithms

Situation:
I wish to perform a Deep-level Analysis of a given text, which would mean:
Ability to extract keywords and assign importance levels based on contextual usage.
Ability to draw conclusions on the mood expressed.
Ability to hint on the education level (word does this a little bit though, but something more automated)
Ability to mix-and match phrases and find out certain communication patterns
Ability to draw substantial meaning out of it, so that it can be quantified and can be processed for answering by a machine.
Question:
What kind of algorithms and techniques need to be employed for this?
Is there a software that can help me in doing this?
When you figure out how to do this please contact DARPA, the CIA, the FBI, and all other U.S. intelligence agencies. Contracts for projects like these are items of current research worth many millions in research grants. ;)
That being said you'll need to process it in layers and analyze at each of those layers. For items 2 and 3 you'll find training an SVM on n-tuples (try, 3) words will help. For 1 and 4 you'll want deeper analysis. Use a tool like NLTK, or one of the many other parsers and find the subject words in sentences and related words. Also use WordNet (from Princeton)
to find the most common senses used and take those as key words.
5 is extremely challenging, I think intelligent use of the data above can give you what you want, but you'll need to use all your grammatical knowledge and programming knowledge, and it will still be very rough grained.
It sounds like you might be open to some experimentation, in which case a toolkit approach might be best? If so, look at the NLTK Natural Language Toolkit for Python. Open source under the Apache license, and there are a couple of excellent books about it (including one from O'Reilly which is also released online under a creative commons license).

Programmatic parsing and understanding of language (English)

I am looking for some resources pertaining to the parsing and understanding of English (or just human language in general). While this is obviously a fairly complicated and wide field of study, I was wondering if anyone had any book or internet recommendations for study of the subject. I am aware of the basics, such as searching for copulas to draw word relationships, but anything you guys recommend I will be sure to thoroughly read.
Thanks.
Check out WordNet.
You probably want a book like "Representation and Inference for Natural Language - A First Course in Computational Semantics"
http://homepages.inf.ed.ac.uk/jbos/comsem/book1.html
Another way is looking at existing tools that already do the job on the basis of research papers: http://nlp.stanford.edu/index.shtml
I've used this tool once, and it's very nice. There's even an online version that lets you parse English and draws dependency trees and so on.
So you can start taking a look at their papers or the code itself.
Anyway take in consideration that in any field, what you get from such generic tools is almost always not what you want. In the sense that the semantics attributed by such tools is not what you would expect. For most cases, given a specific constrained domain it's preferable to roll your own parser, and do your best to avoid any ambiguities beforehand.
The process that you describe is called natural language understanding. There are various algorithms and software tools that have been developed for this purpose.

How to choose a Feature Selection Algorithm? - advice

Is there a research paper/book that I can read which can tell me for the problem at hand what sort of feature selection algorithm would work best.
I am trying to simply identify twitter messages as pos/neg (to begin with). I started out with Frequency based feature selection (having started with NLTK book) but soon realised that for a similar problem various individuals have choosen different algorithms
Although I can try Frequency based, mutual information, information gain and various other algorithms the list seems endless.. and was wondering if there an efficient way then trial and error.
any advice
Have you tried the book I recommended upon your last question? It's freely available online and entirely about the task you are dealing with: Sentiment Analysis and Opinion Mining by Pang and Lee. Chapter 4 ("Extraction and Classification") is just what you need!
I did an NLP course last term, and it came pretty clear that sentiment analysis is something that nobody really knows how to do well (yet). Doing this with unsupervised learning is of course even harder.
There's quite a lot of research going on regarding this, some of it commercial and thus not open to the public. I can't point you to any research papers but the book we used for the course was this (google books preview). That said, the book covers a lot of material and might not be the quickest way to find a solution to this particular problem.
The only other thing I can point you towards is to try googling around, maybe in scholar.google.com for "sentiment analysis" or "opinion mining".
Have a look at the NLTK movie_reviews corpus. The reviews are already pos/neg categorized and might help you with training your classifier. Although the language you find in Twitter is probably very different from those.
As a last note, please post any successes (or failures for that matter) here. This issue will come up later for sure at some point.
Unfortunately, there is no silver bullet for anything when dealing with machine learning. It's usually referred to as the "No Free Lunch" theorem. Basically a number of algorithms work for a problem, and some do better on some problems and worse on others. Over all, they all perform about the same. The same feature set may cause one algorithm to perform better and another to perform worse for a given data set. For a different data set, the situation could be completely reversed.
Usually what I do is pick a few feature selection algorithms that have worked for others on similar tasks and then start with those. If the performance I get using my favorite classifiers is acceptable, scrounging for another half percentage point probably isn't worth my time. But if it's not acceptable, then it's time to re-evaluate my approach, or to look for more feature selection methods.

Resources