Answers extraction from an unstructured text - nlp

I want to extract some answers for a group of given questions from an unstructured text. I searched a library for this proposal but I haven't found it.
p.s. I have used NLP tools/libraries, such as NLTK, OpenNLP, etc...
thanks in advance

You ask for a Question Answering (QA) toolkit. Its rarely to find an open source code or a ready to use system. You can look for scientific articles that describe a QA system and try to replicate it from using the toolkits that you mentioned above. Normally these systems use NER taggger, POS, coreference resolution, etc..

Related

Natural language de-identification

I am looking for a natural language tool that can automatically de-identify English text. For example, every email address should be renamed or obscured. But proper names should be de-identified, as should addresses and what not.
There is a MITRE Identification Scrubber Toolkit. I don't know how well it works.
My questions:
Are there any other tools out there?
Does anyone have experience with the MITRE tool? How well does it work?
Thanks.
De-identification (perhaps more often referred to as anonymization) is a very active research area as its success is obviously a requirement for the use of authentic text corpora in such fields as NLP for healthcare, medicine and the like. I recommend that you look at the tools listed in the answer to this question on CrossValidated. If you follow the links further, you will find research papers describing how these tools work with further references and results evaluations.

Generating questions from text (NLP)

What approaches are there to generating question from a sentence? Let's say I have a sentence "Jim's dog was very hairy and smelled like wet newspaper" - which toolkit is capable of generating a question like "What did Jim's dog smelled like?" or "How hairy was Jim's dog?"
Thanks!
Unfortunately there isn't one, exactly. There is some code written as part of Michael Heilman's PhD dissertation at CMU; perhaps you'll find it and its corresponding papers interesting?
If it helps, the topic you want information on is called "question generation". This is pretty much the opposite of what Watson does, even though "here is an answer, generate the corresponding question" is exactly how Jeopardy is played. But actually, Watson is a "question answering" system.
In addition to the link to Michael Heilman's PhD provided by dmn, I recommend checking out the following papers:
Automatic Question Generation and Answer Judging: A Q&A Game for Language Learning (Yushi Xu, Anna Goldie, Stephanie Seneff)
Automatic Question Generationg from Sentences (Husam Ali, Yllias Chali, Sadid A. Hasan)
As of 2022, Haystack provides a comprehensive suite of tools to accomplish the purpose of Question generation and answering using the latest and greatest Transformer models and Transfer learning.
From their website,
Haystack is an open-source framework for building search systems that work intelligently over large document collections. Recent advances in NLP have enabled the application of question answering, retrieval and summarization to real world settings and Haystack is designed to be the bridge between research and industry.
NLP for Search: Pick components that perform retrieval, question answering, reranking and much more
Latest models: Utilize all transformer based models (BERT, RoBERTa, MiniLM, DPR) and smoothly switch when new ones get published
Flexible databases: Load data into and query from a range of databases such as Elasticsearch, Milvus, FAISS, SQL and more
Scalability: Scale your system to handle millions of documents and deploy them via REST API
Domain adaptation: All tooling you need to annotate examples, collect user-feedback, evaluate components and finetune models.
Based on my personal experience, I am 95% successful in generating Questions and Answers in my Internship for training purposes. I have a sample web user interface to demonstrate and the code too. My Web App and Code.
Huge shoutout to the developers on the Slack channel for helping noobs in AI like me! Implementing and deploying a NLP model has never been easier if not for Haystack. I believe this is the only tool out there where one can easily develop and deploy.
Disclaimer: I do not work for deepset.ai or Haystack, am just a fan of haystack.
As of 2019, Question generation from text has become possible. There are several research papers for this task.
The current state-of-the-art question generation model uses language modeling with different pretraining objectives. Research paper, code implementation and pre-trained model are available to download on the Paperwithcode website link.
This model can be used to fine-tune on your own dataset (instructions for finetuning are given here).
I would suggest checking out this link for more solutions. I hope it helps.

Simple toolkits for emotion (sentiment) analysis (not using machine learning)

I am looking for a tool that can analyze the emotion of short texts. I searched for a week and I couldn't find a good one that is publicly available. The ideal tool is one that takes a short text as input and guesses the emotion. It is preferably a standalone application or library.
I don't need tools that is trained by texts. And although similar questions are asked before no satisfactory answers are got.
I searched the Internet and read some papers but I can't find a good tool I want. Currently I found SentiStrength, but the accuracy is not good. I am using emotional dictionaries right now. I felt that some syntax parsing may be necessary but it's too complex for me to build one. Furthermore, it's researched by some people and I don't want to reinvent the wheels. Does anyone know such publicly/research available software? I need a tool that doesn't need training before using.
Thanks in advance.
I think that you will not find a more accurate program than SentiStrength (or SoCal) for this task - other than machine learning methods in a specific narrow domain. If you have a lot (>1000) of hand-coded data for a specific domain then you might like to try a generic machine learning approach based on your data. If not, then I would stop looking for anything better ;)
Identifying entities and extracting precise information from short texts, let alone sentiment, is a very challenging problem specially with short text because of lack of context. Hovewer, there are few unsupervised approaches to extracting sentiments from texts mainly proposed by Turney (2000). Look at that and may be you can adopt the method of extracting sentiments based on adjectives in the short text for your use-case. It is hovewer important to note that this might require you to efficiently POSTag your short text accordingly.
Maybe EmoLib could be of help.

Natural Language Processing Algorithm for mood of an email

One simple question (but I haven't quite found an obvious answer in the NLP stuff I've been reading, which I'm very new to):
I want to classify emails with a probability along certain dimensions of mood. Is there an NLP package out there specifically dealing with this? Is there an obvious starting point in the literature I start reading at?
For example, if I got a short email something like "Hi, I'm not very impressed with your last email - you said the order amount would only be $15.95! Regards, Tom" then it might get 8/10 for Frustration and 0/10 for Happiness.
The actual list of moods isn't so important, but a short list of generally positive vs generally negative moods would be useful.
Thanks in advance!
--Trindaz on Fedang #NLP
You can do this with a number of different NLP tools, but nothing to my knowledge comes with it ready out of the box. Perhaps the easiest place to start would be with LingPipe (java), and you can use their very good sentiment analysis tutorial. You could also use NLTK if python is more your bent. There are some good blog posts over at Streamhacker that describe how you would use Naive Bayes to implement that.
Check out AlchemyAPI for sentiment analysis tools and scikit-learn or any other open machine learning library for the classifier.
if you have not decided to code the implementation, you can also have the data classified by some other tool. google prediction api may be an alternative.
Either way, you will need some labeled data and do the preprocessing. But if you use a tool that may help you get better accuracy easily.

Should I use LingPipe or NLTK for extracting names and places?

I'm looking to extract names and places from very short bursts of text example
"cardinals vs jays in toronto"
" Daniel Nestor and Nenad Zimonjic play Jonas Bjorkman w/ Kevin Ullyett, paris time to be announced"
"jenson button - pole position, brawn-mercedes - monaco".
This data is currently in a MySQL database, and I (pretty much) have a separate record for each athlete, though names are sometimes spelled wrong, etc.
I would like to extract the athletes and locations.
I usually work in PHP, but haven't been able to find a library for entity extraction (and I may want to get deeper into some NLP and ML in the future).
From what I've found, LingPipe and NLTK seem to be the most recommended, but I can't figure out if either will really suit my purpose, or if something else would be better.
I haven't programmed in either Java or Python, so before I start learning new languages, I'm hoping to get some advice on what route I should follow, or other recommendations.
What you're describing is named entity recognition. So I'd recommend checking out the other questions regarding this topic if you haven't already seen them. This looks like the most useful answer to me.
I can't really comment about whether NLTK or LingPipe is best suited for this task although from looking at the answers it looks like there's quite a few other resources written in Java.
One advantage of going with NLTK is that Python is very accessible as a language. The other advantage is that the NLTK book (which is available for free) offers an introduction to both Python and NLTK at the same time, which would be useful for you.

Resources