How to Detect Present Simple Tense in English Sentences using a Rule-Based Approach?

How to Detect Present Simple Tense in English Sentences using a Rule-Based Approach? - nlp

I have a straightforward task of determining the sentence structure, specifically to identify if a sentence written in plain English is in the "present simple" tense. I am aware of a couple of libraries that could help with this task:
OpenNLP
CoreNLP
However, it seems that both of these libraries use machine learning in the background and require pre-trained language models. I am looking for a more lightweight solution, possibly using a rule-based approach. Is it possible to use OpenNLP or CoreNLP without machine learning for my task?

Related

Could I use OpenNLP without machine learning?

I have a straightforward task of determining the sentence structure, specifically to identify if a sentence written in plain English is in the "present simple" tense and I want to write a small java program without using machine learning.
I know that I can use OpenNLP library for that, but it seems that library requires already trained model to check plain text.
Can I use OpenNLP library without machine-learning? I have heard about rule-based approach, could it help here?

Can anyone give a brief overview of how to proceed with Named Entity Recognition in Tamil Language?

I have looked at Stanford NER and Polyglot. Both does not support Tamil Language.
I would like to use ML along with some rule based NLP processing to do the entity recognition

Neither Stanford NER nor Polyglot are rule-based. If you are only considering rule-based systems, you should probably look for existing frameworks that process Tamil correctly, or head to generic ones (e.g. GATE).
Have a look at this paper that report existing NER systems for Tamil, you may contact authors.
If you find no system available, it should be rather easy to train one using existing datasets such as NER-FIRE2013 and NER-FIRE2014: ask organizers how it would be possible to obtain access to those datasets.
Hope this helps!

What's the difference between Stanford Tagger, Parser and CoreNLP?

I'm currently using different tools from Stanford NLP Group and trying to understand the differences between them. It seems to me that somehow they intersect each other, since I can use same features in different tools (e.g. tokenize, and POS-Tag a sentence can be done by Stanford POS-Tagger, Parser and CoreNLP).
I'd like to know what's the actual difference between each tool and in which situations I should use each of them.

All Java classes from the same release are the same, and, yes, they overlap. On a code basis, the parser and tagger are basically subsets of what is available in CoreNLP, except that they do have a couple of little add-ons of their own, such as the GUI for the parser. In terms of provided models, the parser and tagger come with models for a range of languages, whereas CoreNLP ships only with English out of the box. However, you can then download language-particular jars for CoreNLP which provide all the models we have for different languages. Anything that is available in any of the releases is present in the CoreNLP github site: https://github.com/stanfordnlp/CoreNLP

Accuracy: ANNIE vs Stanford NLP vs OpenNLP with UIMA

My work is planning on using a UIMA cluster to run documents through to extract named entities and what not. As I understand it, UIMA have very few NLP components packaged with it. I've been testing GATE for awhile now and am fairly comfortable with it. It does ok on normal text, but when we run it through some representative test data, the accuracy drops way down. The text data we have internally is sometimes all caps, sometimes all lowercase, or a mix of the two in the same document. Even using ANNIE's all caps rules, the accuracy still leaves much to be desired. I've recently heard of Stanford NLP and OpenNLP but haven't had time to extensively train and test them. How do those two compare in terms of accuracy with ANNIE? Do they work with UIMA like GATE does?
Thanks in advance.

It's not possible/reasonable to give a general estimate on performance of these systems. As you said, on your test data the accuracy declines. That's for several reasons, one is the language characteristics of your documents, another is characteristics of the annotations you are expecting to see. Afaik for every NER task there are similar but still different annotation guidelines.
Having that said, on your questions:
ANNIE is the only free open source rule-based NER system in Java I could find. It's written for news articles and I guess tuned for the MUC 6 task. It's good for proof of concepts, but getting a bit outdated. Main advantage is that you can start improving it without any knowledge in machine learning, nlp, well maybe a little java. Just study JAPE and give it a shot.
OpenNLP, Stanford NLP, etc. come by default with models for news articles and perform (just looking at results, never tested them on a big corpus) better than ANNIE. I liked the Stanford parser better than OpenNLP, again just looking at documents, mostly news articles.
Without knowing what your documents look like I really can't say much more. You should decide if your data is suitable for rules or you go the machine learning way and use OpenNLP or Stanford parser or Illinois tagger or anything. The Stanford parser seems more appropriate for just pouring your data, training and producing results, while OpenNLP seems more appropriate for trying different algorithms, playing with parameters, etc.
For your GATE over UIMA dispute, I tried both and found more viral community and better documentation for GATE. Sorry for giving personal opinions :)

Just for the record answering the UIMA angle: For both Stanford NLP and OpenNLP, there is excellent packaging as UIMA analysis engines available via the DKPro Core project.

I would like to add one more note. UIMA and GATE are two frameworks for the creation of Natural Language Processing(NLP) applications. However, Name Entity Recognition (NER) is a basic NLP component and you can find an implementation of NER, independent of UIMA and GATE. The good news is you can usually find a wrapper for a decent NER in the UIMA and GATE. To make it clear let see this example:
OpenNLP NER
A wrapper for OpenNLP NER in GATE
A wrapper for OpenNLP NER in UIMA
It is the same for the Stanford NER component.
Coming back to your question, this website lists the state of the art NERs:
http://www.aclweb.org/aclwiki/index.php?title=Named_Entity_Recognition_(State_of_the_art)
For example, in the MUC-7 competition, best participant named LTG got the result with the accuracy of 93.39%.
http://www.aclweb.org/aclwiki/index.php?title=MUC-7_(State_of_the_art)
Note that if you want to use such a state of are implementation, you may have some issue with their license.

Natural Language Processing Package

I have started working on a project which requires Natural Language Processing. We have do the spell checking as well as mapping sentences to phrases and their synonyms. I first thought of using GATE but i am confused on what to use? I found an interesting post here which got me even more confused.
http://lordpimpington.com/codespeaks/drupal-5.1/?q=node/5
Please help me decide on what suits my purpose the best. I am working a web application which will us this NLP tool as a service.

You didn't really give much info, but try this: http://www.nltk.org/
I don't think NLTK does spell checking (I could be wrong on this), but it can do parts of speech tagging for text input.
For finding/matching synonyms you could use something like WordNet http://wordnet.princeton.edu/
If you're doing something really domain specific: I would recommend coming up with your own ontology for domain specific terms.

If you are using Python you can develop a spell checker with Python Enchant.
NLTK is good for developing Sentiment Analysis system too. I have some prototypes of the same too
Jaggu

If you are using deep learning based models, and if you have sufficient data, you can implement task specific models for any purpose. With the development of deep leaning based languages models, you can used word embedding based models with lexicon resources to obtain synonyms and antonyms. You can also follow the links below to obtain more resources.
https://stanfordnlp.github.io/CoreNLP/
https://www.nltk.org/
https://wordnet.princeton.edu/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string