Tools to generate concepts and concept graph for searched articles - search

When searching a paper using some online library, such as Springer, the returned result will also show the related concept automatically extracted from this paper as well as some knowledge relationship graph based on these concepts. The following is an screenshot of the search output.
I would like to know which kind of algorithms and software are able to generate this kind of output. Are there any open-source tools being able to do that?

The algorithm being used is K-Means. K-Means is an unsupervised clustering algorithm. Articles are clustered by topic. Some articles contain multiple topics, many of which are the same between article. Those shared topics are then branches emerging from the initial topic. SKLearn is a great library for Python that does clustering very well. R is also great for clustering. Hope this helps!

Related

Tools for Visualising Algorithms

I would like to develop some visualisations for various string matching algorithms. Ideally, once the visualisation has been developed, I should be able to interact with it, for instance, by experimenting with different inputs to see how it affects the algorithm. Can anyone suggest what would be the best tool to use to create these visualisations?
I've been told that Mathematica is a tool that could be used with visualising algorithms, has anyone had much experience in doing this? How well suited would Mathematica be for visualising a string matching algorithm?
If you can code in javascript, d3.js is an amazing data visualization library.
Here's an example of a visualization of an algorithm to generate Hamiltonian graphs. It was built using d3.
Here's another example visualizing min-heap generation.
you can find a lot of visualizations here:
http://www.comp.nus.edu.sg/~stevenha/visualization/
source: Competitive Programming 3 by Steven Halim and Felix Halim

how to design a full-text indexing system?

Lucene is a great open source indexng library, my problem is not about how to use this kind of indexing tool, but to learn and understand how they are designed.
Maybe I should read the source code of Lucene, but I can't seem to find any tutorial about how this great work is done.
So, is there any other way or a book that can help me gain a concrete understanding of how to design such a indexing system?
Thank you.
The science behind Lucene is called as Information Retrieval. When you start appreciating the Algorithms and Data Structures behind Information Retrieval, you are all done and Lucene or Sphinx would merely be tools to solve your tasks. The very first thing is you can go through Inverted Index Data Structure.
A great book about Information Retrieval Algorithms and Data Structure can be found here: http://nlp.stanford.edu/IR-book/ This Stanford text is a good resource and a good starting point in coming to know about how Information Retrieval Systems are designed

Generating questions from text (NLP)

What approaches are there to generating question from a sentence? Let's say I have a sentence "Jim's dog was very hairy and smelled like wet newspaper" - which toolkit is capable of generating a question like "What did Jim's dog smelled like?" or "How hairy was Jim's dog?"
Thanks!
Unfortunately there isn't one, exactly. There is some code written as part of Michael Heilman's PhD dissertation at CMU; perhaps you'll find it and its corresponding papers interesting?
If it helps, the topic you want information on is called "question generation". This is pretty much the opposite of what Watson does, even though "here is an answer, generate the corresponding question" is exactly how Jeopardy is played. But actually, Watson is a "question answering" system.
In addition to the link to Michael Heilman's PhD provided by dmn, I recommend checking out the following papers:
Automatic Question Generation and Answer Judging: A Q&A Game for Language Learning (Yushi Xu, Anna Goldie, Stephanie Seneff)
Automatic Question Generationg from Sentences (Husam Ali, Yllias Chali, Sadid A. Hasan)
As of 2022, Haystack provides a comprehensive suite of tools to accomplish the purpose of Question generation and answering using the latest and greatest Transformer models and Transfer learning.
From their website,
Haystack is an open-source framework for building search systems that work intelligently over large document collections. Recent advances in NLP have enabled the application of question answering, retrieval and summarization to real world settings and Haystack is designed to be the bridge between research and industry.
NLP for Search: Pick components that perform retrieval, question answering, reranking and much more
Latest models: Utilize all transformer based models (BERT, RoBERTa, MiniLM, DPR) and smoothly switch when new ones get published
Flexible databases: Load data into and query from a range of databases such as Elasticsearch, Milvus, FAISS, SQL and more
Scalability: Scale your system to handle millions of documents and deploy them via REST API
Domain adaptation: All tooling you need to annotate examples, collect user-feedback, evaluate components and finetune models.
Based on my personal experience, I am 95% successful in generating Questions and Answers in my Internship for training purposes. I have a sample web user interface to demonstrate and the code too. My Web App and Code.
Huge shoutout to the developers on the Slack channel for helping noobs in AI like me! Implementing and deploying a NLP model has never been easier if not for Haystack. I believe this is the only tool out there where one can easily develop and deploy.
Disclaimer: I do not work for deepset.ai or Haystack, am just a fan of haystack.
As of 2019, Question generation from text has become possible. There are several research papers for this task.
The current state-of-the-art question generation model uses language modeling with different pretraining objectives. Research paper, code implementation and pre-trained model are available to download on the Paperwithcode website link.
This model can be used to fine-tune on your own dataset (instructions for finetuning are given here).
I would suggest checking out this link for more solutions. I hope it helps.

Analysing meaning of sentences

Are there any tools that analyze the meaning of given sentences? Recommendations are greatly appreciated.
Thanks in advance!
I am also looking for similar tools. One thing I found recently was this sentiment analysis tool built by researchers at Stanford.
It provides a model of analyzing the sentiment of a given sentence. It's interesting and even this seemingly simple idea is quite involved to model in an accurate way. It utilizes machine learning to develop higher accuracy as well. There is a live demo where you can input sentences to analyze.
http://nlp.stanford.edu/sentiment/
I also saw this RelEx semantic dependency relationship extractor.
http://wiki.opencog.org/w/Sentence_algorithms
Some natural language understanding tools can analyze the meaning of sentences, including NLTK and Attempto Controlled English. There are several implementations of discourse representation structures and semantic parsers with a similar purpose.
There are also several parsers that can be used to generate a meaning representation from the text that is being parsed.

NLP: Language Analysis Techniques and Algorithms

Situation:
I wish to perform a Deep-level Analysis of a given text, which would mean:
Ability to extract keywords and assign importance levels based on contextual usage.
Ability to draw conclusions on the mood expressed.
Ability to hint on the education level (word does this a little bit though, but something more automated)
Ability to mix-and match phrases and find out certain communication patterns
Ability to draw substantial meaning out of it, so that it can be quantified and can be processed for answering by a machine.
Question:
What kind of algorithms and techniques need to be employed for this?
Is there a software that can help me in doing this?
When you figure out how to do this please contact DARPA, the CIA, the FBI, and all other U.S. intelligence agencies. Contracts for projects like these are items of current research worth many millions in research grants. ;)
That being said you'll need to process it in layers and analyze at each of those layers. For items 2 and 3 you'll find training an SVM on n-tuples (try, 3) words will help. For 1 and 4 you'll want deeper analysis. Use a tool like NLTK, or one of the many other parsers and find the subject words in sentences and related words. Also use WordNet (from Princeton)
to find the most common senses used and take those as key words.
5 is extremely challenging, I think intelligent use of the data above can give you what you want, but you'll need to use all your grammatical knowledge and programming knowledge, and it will still be very rough grained.
It sounds like you might be open to some experimentation, in which case a toolkit approach might be best? If so, look at the NLTK Natural Language Toolkit for Python. Open source under the Apache license, and there are a couple of excellent books about it (including one from O'Reilly which is also released online under a creative commons license).

Resources