source code for text summarization by java

source code for text summarization by java - text

My project requires text summarizer. Is there any source code for this in java? Or if I must build it myself, is there any book on the subjects?

You will most likely have to build a text summarizer yourself. Although there are some libraries, API's and software that do text summarizer you can maybe check them out to see if they are useful.
Here is a good website I found that describes 30+ text summarizer libraries, API's and software:
http://blog.mashape.com/post/58164039983/list-of-30-summarizer-apis-libraries-and-software
Hope this helps. Good Luck.

Related

Answers extraction from an unstructured text

I want to extract some answers for a group of given questions from an unstructured text. I searched a library for this proposal but I haven't found it.
p.s. I have used NLP tools/libraries, such as NLTK, OpenNLP, etc...
thanks in advance

You ask for a Question Answering (QA) toolkit. Its rarely to find an open source code or a ready to use system. You can look for scientific articles that describe a QA system and try to replicate it from using the toolkits that you mentioned above. Normally these systems use NER taggger, POS, coreference resolution, etc..

Generating docs from UML model with Rational Tools?

Does anyone know if there's a usable tool for generating RUP-style artifacts from a UML model in the rational toolset (i.e. Rational Software Architect, App Developer etc)?
Specifically, I need to be able to extract information from class (and potentially sequence) diagrams and create software design documents, preferably using Word (or maybe PDF).
I've tried BIRT and its just not usable. Is there anything else out there that is?
Thanks

Haven't used it for a few years but SoDA used to be the main way to generate docs with Rational tools. It wasn't free back then, not sure about now.
That's the only 'out of the box' doc generator I know of. However you should be able to use some/most of the eclipse modelling tools to roll your own by extracting model info into some intermediate format and then generating docs. So, for example, you could:
Use xtend2 to extract model info and write as restructured text files
Use sphinx to generate html or pdf from the .rst files.
hth.

There is the Rational Publishing Engine.
I'm not sure how closely this resembles either BIRT or SoDA, whether it's a rehash or a from-scratch implementation or what, but it's what's supported by IBM at the present time.
I have no first-hand experience with it, but I have a colleague who does and he seems to like it.

Natural language de-identification

I am looking for a natural language tool that can automatically de-identify English text. For example, every email address should be renamed or obscured. But proper names should be de-identified, as should addresses and what not.
There is a MITRE Identification Scrubber Toolkit. I don't know how well it works.
My questions:
Are there any other tools out there?
Does anyone have experience with the MITRE tool? How well does it work?
Thanks.

De-identification (perhaps more often referred to as anonymization) is a very active research area as its success is obviously a requirement for the use of authentic text corpora in such fields as NLP for healthcare, medicine and the like. I recommend that you look at the tools listed in the answer to this question on CrossValidated. If you follow the links further, you will find research papers describing how these tools work with further references and results evaluations.

text mining library or lingual library?

i have a bunch of data harvested from a forum I own, and would like to do some text mining or use some linguistic library to extract useful information.
any text mining, data mining library in any language will do.
Thank you.

I recommend that you have a look at R. It has an extensive number of text mining packages: have a look at the Natural Language Processing view. In particular, look at the tm package. Here are some relevant links:
Paper about the package in the Journal of Statistical Computing: http://www.jstatsoft.org/v25/i05/paper. The paper includes a nice example of an analysis of the R-devel
mailing list (https://stat.ethz.ch/pipermail/r-devel/) newsgroup postings from 2006.
Package homepage: http://cran.r-project.org/web/packages/tm/index.html
Look at the introductory vignette: http://cran.r-project.org/web/packages/tm/vignettes/tm.pdf
Another example of useful package for this is Gary King's readme package.

You may like to have a look at the Python NLTK (Natural Language ToolKit): it's specifically designed for this kind of thing.
There is also a great book you can but to get you started.

Mallet is a java library designed for text mining. Once you have preprocessed the text data, a general data mining tool like Weka would also suffice your task.
If you have access to SPSS or SAS, their products should be more easier to use.

Try GATE, it has GUI and of course you can use java api for more power:
http://gate.ac.uk/family/developer.html
You can also use Weka for processing text and doing text mining, have a look at these useful lectures:
http://sentimentmining.net/weka/

stanford core-nlp is good for English text, and has things like Named Entity Recognition. Take a look at: http://nlp.stanford.edu/software/corenlp.shtml
GATE, which Ehsan already recommended, is also good, but it can be a bit complicated if you need to write your own components. For large-scale stuff it's great though.
UIMA is similar to GATE, but not as easy to use because it doesn't feature an extensive GUI like GATE. (http://uima.apache.org)

I would recommend the following Python libraries:
nltk
keras
tensorflow
Note: Before any text analysis you should clean the data based on your requirement

Resources to learn QT/Embedded 4.5?

Can you please give me Resources(Books, Tutorials, Other useful links ) to learn QT/Embedded 4.5, To get quick start in QT programming.
P.S. I am quite familiar with windoing system programming and C++.

You can give a look at the official help: http://qt.nokia.com/doc/4.5/qt-embedded.html. Then Qt on embedded platform is the same as on other platform. So "standard" help is great: http://qt.nokia.com/doc/4.5/index.html.

BOOK
IMHO the best beginners' book is Foundations of Qt Development by Johan Thelin.
http://www.apress.com/book/view/1590598318
(see also google books for a good preview)
The first 4 chapters (120 pages) are a great hands-on introduction to all you need to know to start developing applications in Qt.
The following 12 chapters go in detail into specific topics (eg files and xml, databases, threading, networking, project building, drawing and printing, ...).
CODE EXAMPLES
Once you read the first 4 chapters, you'll be able to easily navigate Qt's excellent documentation.
At that point you can dive into the examples Qt provides (on Mac OS X they get installed in /Developer/Examples/Qt). There are many, so you're bound to find something relevant. All are of high quality, and some have a walk-through documentation.
Have fun!

For starting Qt Programming there is an excellent book you should download immediately. It is titled "C++ GUI Programming with Qt 4, Second Edition". Go Google it up and find a copy in either PDF or CHM format. Yes, it is perfectly legal.
Search up this term and you should be able to find it:
"GUI Programming With Qt 4 2nd Edition.chm"
or change the file type to PDF. It's out there. My understanding is that it is a perfectly legal download, but I never bookmarked when I obtained it.
http://qt.nokia.com/developer/books
has a listing of good books you can purchase.

Go to this page and hunt up the book download:
http://dcsoft.com/community_server/blogs/dcsoft/archive/2009/03/24/book-review-c-gui-programming-with-qt-4-second-edition-by-jasmin-blanchette-and-mark-summerfield-prentice-hall.aspx

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

source code for text summarization by java - text

My project requires text summarizer. Is there any source code for this in java? Or if I must build it myself, is there any book on the subjects?

Related

Answers extraction from an unstructured text

Generating docs from UML model with Rational Tools?

Natural language de-identification

text mining library or lingual library?

Resources to learn QT/Embedded 4.5?

Categories

Resources