Recommendations for open/source text indexing and search [closed] - search

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I just discovered Lucene (Java library) and starting to read up on it.
I'm interesting in taking some works of literature (for example, Philo, Josephus), and indexing them, then doing the following types of analysis (similar to what some Bible software programs do):
1) find word x within 2 or 3 words of word y
2) find "work* of * hand*" - would find "works of your hands", "work of his hand" etc...
3) find literary patterns (also called "motiffs") such as they author uses the phrase "in that day". (I think this might be the trickiest, might have to find all combinations of 2-7 word phrases then count them and rank them, only showing the top 25 for example). This might show for example that Josephus like to use one sets of phrases, and Philo another.
Are there any open-source libraries that you would recommend?
My language preferences would probably be 1) Python, 2) C#, 3) Java.
Ideally no dependencies on any proprietary database.
Thanks,
Neal

Lucene is the best one out there in my opinion in terms of popularity, community, activity and tooling. I suggest you look at Solr which is built on top of Lucene. Another open source indexing framework I found is Egothor which I am not sure what is the adoption rate.
And here is a survey that might be help you in choosing the right one.
Here you can find more open source and commercial libraries. I have seen few of them supporting bindings for more than 1 programming language. If you have decided to go with Lucene, then you might need Luke for your debugging purposes.

Related

Text based UML Diagram Generators [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Which generator tools do you know that are able to generate UML (and perhaps other) diagrams out of text (simple ASCII) based input?
I know about http://plantuml-depend.sourceforge.net/screenshot/screenshot.html
I'm looking for something like
http://yuml.me/
https://www.websequencediagrams.com/#
Requirements:
Generator shouldn't have too many dependencies
CLI based - specify input and output file
Output names should be predictable or specifiable
Possible output formats: SVG, PNG, JPEG, PDF
Generator should be free to use, or available for purchase (no subscription)
Ideally diagram layouting can be influenced in case default layout isn't pleasant
Clean visual diagrams - pleasant to view and read
Actively maintained software
Alternatively to a CLI Tool reading ASCII input I'd also be interested in UML Libraries.
Thanks so far
Claude
As far as I know I keep the most extensive list of textual UML tools here: http://modeling-languages.com/uml-tools/#textual
Hope you'll find at least one that you like
StarUml - http://staruml.sourceforge.net/en/ is open source editor that stores result in XML file. Export to pictures is supported

NLP tools for right-to-left languages? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I'm trying to use NLP within a web application. What I wanna do is a little information extraction on Persian sentences. So I need some RTL-friendly NLP tools. I've tried python's nltk before but I don't know if it does support RTL languages as well. It's very good if it does because I have a good relationship with Django as well. Any information on this topic is appreciated.
I have never tried using it for RTL, but I think it is perfectly capable of serving your needs, as it is a toolkit, not a system per se.
I could not find any restrictions regarding this. In fact, I have found some other references on people using it for Arabic:
Tokenization of Arabic words using NLTK
Python Arabic NLP
Now, you do need to find some Persian corpora. I could not find any during my brief research, but you can always hit the NLTK Users Mailing List.

Use case diagrams [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
What are the free options for creating use case diagrams under Windows? I need some simple use case diagrams for a school project.
Why install anything when you can use free online tools such as
http://creately.com/
http://yuml.me/
http://www.gliffy.com/uses/uml-software/
There are multiple options, but not yet mentioned are:
Cacoo - web tool for creating various diagrams,
Dia - standalone toolf for creating diagrams, with Win32 version also available in downloads,
When I remember right, there is a community edition of Magic Draw (the leading app?): https://www.magicdraw.com/
I already used Poseidon (Community) and ArgoUML, both not really convenient.
Recently I found a great tool called yEd: http://www.yworks.com/de/products_yed_about.htm This can be run via web start. Not really UML but use cases are perfect with yEd.
Apparently there is already something in Eclipse: http://www.eclipse.org/modeling/mdt/?project=uml2 I did use it to test. Not yet convincing usability.
Wikipedia says: http://en.wikipedia.org/wiki/List_of_Unified_Modeling_Language_tools
Edit!
Don't miss the stackoverflow search top right of this page.
There's a pretty nice tool called UML Pad.
http://web.tiscalinet.it/ggbhome/umlpad/umlpad.htm

DotNET String comparisons [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
After a few harsh lessons I now always use OrdinalIgnoreCase when comparing Strings in DotNET. I've run into maybe 5 different problems to do with numerics, weird alphabets and localisations. Does anyone know of a good site that explains in depth a lot of the problems with culture specific Strings, preferably with a bunch of good examples of where and how something can fail?
Plenty of MSDN info:
String-Related Issues
Best Practices for Developing World-Ready Applications
New recommendations for Using strings in .Net 2.0
Performing Culture-Insensitive String Comparisons
How culture Affects Strings
And a search for more info.
I actually found MSDN quite useful for this explanation.
For detailed information, have a look at New Recommendations for Using Strings in Microsoft .NET 2.0.
this one seems preety good to me.
I live in Turkey and I know that understanding the turkish İ character will help you understand the concept better.
Here's my favorite: Sorting it all Out

Automated transcription software [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I've noticed that the wiki transcriptions for some of the recent Stack Overflow Podcasts are kind of weak. Clearly, this task calls for a computer program. Is transcribing audio to text (ideally with speaker labels so we know who said what) something that could feasibly be accomplished in software? Are there any active open-source software projects attempting to implement such functionality?
Believe me, I have searched for this before. There are slim to none text to speech that are open source or free to use. From my search there weren't any free speech to text synthesizers. These things are so hard to code and expensive that they can't really be made with an open source approach. If you really need this you would have to purchase it from a company. (although I don't know any off the top of my head).
I've looked into this a little. I tried the Microsoft Speech API but got very poor results. I've been wanting to look into the CMU Sphinx project, especially the Transcriber demo.

Resources