Stanford NLP parsers and idioms that have common semantic meaning - nlp

I have parsed the following sentence in the Stanford CoreNLP demo page and the Stanford parser demo page. Although both result in a parse that can imply purpose semantics (hinging on the advcl and the sbar accordingly), clearly the parsers don't capture "in order to" as an expression whose meaning much transcends the relationships between its own word constituents (as in an idiom).
he went to the shop in order to buy food
What kind of parsing, or pre-processing stage, may yield "in order to" as a single unit of meaning, thus making it amenable to semantics derivation same as a preposition affords? e.g. "for" may commonly imply purpose semantics, and having "for" parse as a preposition assists in deriving purpose semantics.

As it currently stands, Stanford NLP does not allow this short of forking the code and fixing one nasty bug with your bare hands.

Related

Best lexicons for sentence vs document level analysis

What are the best lexicons for document-level and sentence-level analysis? I'm using Vader currently for sentence-level analysis, however I'm worried that when I move to the document level, Vader may not perform as well as others.
Similar question to the post here, however more specific.
In addition to the sentiment lexica listed in the linked post, I can recommend aFinn sentiment lexicon.
For sentiment analysis, depending on only lexica may not be be best solution, especially on document level. Language is so flexible that its attributes and notions other than sentiment-laden vocabulary effect semantics deeply.
Some of the core notions are contrastive discource markers (especially for document level), negation and modality.
contrastive discourse markers
There are opinions that have both pros and cons within documents and we tie those via those markers like 'however', 'nevertheless' etc. to convey meaning or an idea. For a bag of words approach, the sentences below are treated the same, yet if people to annotate their sentiment with one label, they may not annotate them with the same one:
The laptop has amazing features, but its screen is killing me.
The laptop's screen is killing me, but it has amazing features.
In general, we evaluate these kind of sentences or paragraphs with the sentiment of the subclause after 'but'. Other contastive discource markers have their own semantics as well. This is inspected in an area called discource analysis.
negation and modality
These notions change semantics as well. So, they cannot be overlooked for both levels. There are studies and papers those used negation and modality triggers with sentiment lexica. You can google it 'negation and modality on sentiment analysis' to see what you can do.
Finally what I can suggest is if you have a domain-specific dataset, you may build your own lexicon using distant supervision.
Hope this helps,
Cheers

What is the difference between syntactic analogy and semantic analogy?

At 15:10 of this video about fastText it mentions syntactic analogy and semantic analogy. But I am not sure what the difference is between them.
Could anybody help explain the difference with examples?
Syntactic means syntax, as in tasks that have to do with the structure of the sentence, these include tree parsing, POS tagging, usually they need less context and a shallower understanding of world knowledge
Semantic tasks mean meaning related, a higher level of the language tree, these also typically involve a higher level understanding of the text and might involve tasks s.a. question answering, sentiment analysis, etc...
As for analogies, he is referring to the mathematical operator like properties exhibited by word embedding, in this context a syntactic analogy would be related to plurals, tense or gender, those sort of things, and semantic analogy would be word meaning relationships s.a. man + queen = king, etc... See for instance this article (and many others)

Functional decomposition vs use case

Functional decomposition is not used when applied to use case
modeling.
This phrase is a copied one from a question paper.What is the meaning of that phrase.
This is not equal to the question What is Functional Decomposition?
Use cases deal with synthesizing functions, not decomposing them. A technician decomposes a system to find the pieces it is built of. A business analyst tries to stack single parts of a system on piles so they are formed around a single goal. A use case describes a unique single piece of added value a systems gives to an actor. Indeed, functional decomposition is what most people try when describing use cases: finding bits and pieces. They all (like me) come from the programming guild where this is the daily job. But UC synthesis is working the exact opposite!
I recommend reading Bittner/Spence who explain this very complex approach in detail and very nicely.

NLP of Legal Texts?

I have a corpus of a few 100-thousand legal documents (mostly from the European Union) – laws, commentary, court documents etc. I am trying to algorithmically make some sense of them.
I have modeled the known relationships (temporal, this-changes-that, etc). But on the single-document level, I wish I had better tools to allow fast comprehension. I am open for ideas, but here's a more specific question:
For example: are there NLP methods to determine the relevant/controversial parts of documents as opposed to boilerplate? The recently leaked TTIP papers are thousands of pages with data tables, but one sentence somewhere in there may destroy an industry.
I played around with google's new Parsey McParface, and other NLP solutions in the past, but while they work impressively well, I am not sure how good they are at isolating meaning.
In order to make sense out of documents you need to perform some sort of semantic analysis. You have two main possibilities with their exemples:
Use Frame Semantics:
http://www.cs.cmu.edu/~ark/SEMAFOR/
Use Semantic Role Labeling (SRL):
http://cogcomp.org/page/demo_view/srl
Once you are able to extract information from the documents then you may apply some post-processing to determine which information is relevant. Finding which information is relevant is task related and I don't think you can find a generic tool that extracts "the relevant" information.
I see you have an interesting usecase. You've also mentioned the presence of a corpus(which a really good plus). Let me relate a solution that I had sketched for extracting the crux from research papers.
To make sense out of documents, you need triggers to tell(or train) the computer to look for these "triggers". You can approach this using a supervised learning algorithm with a simple implementation of a text classification problem at the most basic level. But this would need prior work, help from domain experts initially for discerning "triggers" from the textual data. There are tools to extract gists of sentences - for example, take noun phrases in a sentence, assign weights based on co-occurences and represent them as vectors. This is your training data.
This can be a really good start to incorporating NLP into your domain.
Don't use triggers. What you need is a word sense disambiguation and domain adaptation. You want to make sense of is in the documents i.e understand the semantics to figure out the meaning. You can build a legal ontology of terms in skos or json-ld format represent it ontologically in a knowledge graph and use it with dependency parsing like tensorflow/parseymcparseface. Or, you can stream your documents in using a kappa based architecture - something like kafka-flink-elasticsearch with added intermediate NLP layers using CoreNLP/Tensorflow/UIMA, cache your indexing setup between flink and elasticsearch using redis to speed up the process. To understand relevancy you can apply specific cases from boosting in your search. Furthermore, apply sentiment analysis to work out intents and truthness. Your use case is one of an information extraction, summarization, and semantic web/linked data. As EU has a different legal system you would need to generalize first on what is really a legal document then narrow it down to specific legal concepts as they relate to a topic or region. You could also use here topic modelling techniques from LDA or Word2Vec/Sense2Vec. Also, Lemon might also help from converting lexical to semantics and semantics to lexical i.e NLP->ontology ->ontology->NLP. Essentially, feed the clustering into your classification of a named entity recognition. You can also use the clustering to assist you in building out the ontology or seeing what word vectors are in a document or set of documents using cosine similarity. But, in order to do all that it be best to visualize the word sparsity of your documents. Something like commonsense reasoning + deep learning might help in your case as well.

Grammatical framework GF and owl

I am interested in the filed on Computational Linguistics and NLP. I read a lot about Grammatical Framework (GF), which is divided into abstract syntax and concrete syntax. And I know a little bit about OWL, RDF and WordNet. I am confused about the differences between the 2 technologies.
Can we use GF rather than OWL as syntax builders?
Can we eliminate Parser by using GF?
Does GF contains all terms so we don't need to use WordNet?
One of the formal definitions of Grammatical Framework is:
Grammatical Framework (GF), grammaticalframework.org, is a multilingual grammar formalism based on the idea of a shared abstract syntax and mappings between the abstract syntax and concrete languages. GF has hundreds of users all over the world.
The way GF is connected to the Semantic Web is through lemon:
Lemon is a proposed model for modeling lexicon and machine-readable dictionaries and linked to the Semantic Web and the Linked Data cloud.It was designed to meet the following challenges:
RDF-native form to enable leverage of existing Semantic Web technologies (SPARQL, OWL, RIF etc.).
Linguistically sound structure based on LMF to enable conversion to existing offline formats.
Separation of the lexicon and ontology layers, to ensure compatability with existing OWL models.
Linking to data categories, in order to allow for arbitrarily complex linguistic description.
So to answer your first question, GF and OWL complement each other. GF is essentially a set of grammatical rules that can be mapped between languages, but depending on the task at hand, you can use GF to develop powerful Semantic Web tools. For example, GF can be used to verbalise ontologies, as it has been demonstrated in lemon papers.
For the second question, yes. Since the intermediate level of GF is a set of logical rules, you don't need a parser anymore. The morphology and basic syntax mapping can be enough (again, what is your goal? As the definition says, GF covers basic syntax.)
As for WordNet:
WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.
WordNet can be perceived as an ontology, but it is not. It cannot even be called a linguistic ontology. Having hypernym and hyponym relations does not make a dataset into an ontology.
What lemon or ontolex are trying to achieve is to create an ontology that can be used for linguistic purposes. This purpose could be annotation, corpus study, modelling dictionaries, and etc. However, the power of WordNet lies within its synsets (Words from the same lexical category that are roughly synonymous are grouped into synsets.); but the power of RDF/OWL lies within inference.
In the 4 years since this question was first asked, there have been some updates in GF. Most importantly, we now have a WordNet ported into GF, currently for 13 languages, with full inflection tables. You can find the repository in https://github.com/GrammaticalFramework/gf-wordnet#readme and a multilingual web interface in http://www.grammaticalframework.org/~krasimir/gf-wordnet.html. Some examples how to use the interface:
English inflection table:
Finnish inflection table:

Resources