How to make elasticsearch an efficent search engine? - azure

Can some one please provide the points that one may consider while making elasticsearch search engine efficient?
The experience of the developers making the search engine faster and efficient would help new developers like me to make the elasticsearch more reliable.
If the question looks irrelevant please let me know, I will modify it.
Thanks in advance,

Related

Semantics based code search

We have a large number of repositories. We want to implement a semantics(functionality) based code search on those repositories. Right now, we already have implemented keyword based code search in which we crawled through all the repository files and indexed them using elasticsearch. But that doesn't solve our problem as some of the repositories are poorly commented and documented, thus searching for specific codes/libraries become difficult.
So my question is: Is there any opensource libraries or any previous work done in this field which could help us index the semantics of the repository files, so that searching the code becomes easy and this would also help us in re-usability of the codes. I have found some research papers like Semantic code browsing, Semantics-based code search etc. but were of no use as there was no actual implementation given. So can you please suggest some good libraries or projects which could help me in achieving the same.
P.S:-Moreover, companies like Koders, Google, cocycles.com etc. started their code search based on functionality. But most of them have shut down their operations without giving any proper feedback, can anyone please tell me what kind of difficulties they are facing.
not sure if this is what you're looking for, but I wrote https://github.com/google/zoekt , which uses ctags-based understanding of code to improve ranking.
Take a look at insight.io
It provides semantic search and browsing

Simple examples of application dealing with eventual-consistency of distributed datastore?

Is anyone aware of some simple examples of application taking into account the 'eventual consistency' caveat of a distributed database like Cassandra ? I am hoping that there are some design-patterns that helps us deal with this.
If the example is in Python or Java, it'd be easiest for me to understand.
Here is example from datastax.
http://docs.datastax.com/en/developer/java-driver/2.1/common/drivers/reference/cqlStatements.html

what algorithms does AlchemyAPI use?

I'm trying to develop something that extract keywords from a text. I know AlchemyAPI work best for this. Now i wanna know what algorithms AlchemyAPI used so that i can implement code of it on my own. Does anyone has any idea about it. Please share it. Thanks in advance.
I have no idea what specific algorithms AlchemyAPI uses (I'm guessing it is on the extreme end of proprietary), but the Stanford NLP has a lot of information and code that may be useful:
http://www-nlp.stanford.edu/software/lex-parser.shtml

Indexing and searching in hadoop

Could somebody tell me how can address the below mentioned issue?
I have a large number of text files which are stored in HDFS. My client application need to find related files of a particular search word. I would like to know whether it is possible with Apache Solr. Any help is greatly appreciated.
Thanks,
Arun
I think the first question that you need to think about is if the search will be realtime (index will be updated very often) or will it be less often. If, it is the former, then I would strongly advise you to use Elastic Search. And, dont rely solely on my advice. This question has some very good answers to Elastic Search vs Solr Debate.
Solr vs. ElasticSearch
And, as for your question regarding the use of hadoop in Apache Solr, here are some useful links that I found off the internet
http://www.likethecolor.com/2010/09/26/using-hadoop-to-create-solr-indexes
http://architects.dzone.com/articles/solr-hadoop-big-data-love

Search term suggestions

This question has been asked in various ways before, but I'm wondering if people who have experience with automatic search term suggestion could offer advice on the most useful and efficient approaches. Here's the scenario:
I'm just starting on a website for a book that is a dictionary of terms (roughly 1,000 entries, with 300 word explanations on average), many of which are fairly obscure, and it is likely that many visitors to the site would not know how to spell the words. The publisher wants to make full-text search available for every entry. So, I'm hoping to implement a search engine with spelling correction. The main site will probably be done in a PHP framework (or possibly Django) with a MySQL database.
Can anyone with experience in this area give advice on the following:
With a set corpus of this nature, should I be using something like Lucene or Sphinx for the search engine?
As far as I can tell, neither of these has a built-in suggestion function. So it seems I will need to integrate one or more of the following. What are the advantages / disadvantages of:
Suggestion requests through Google's search API
A phonetic comparison algorithm like metaphone() in PHP
A spell checking system like Aspell
A simpler spelling script such as Peter Norvig's
A Levenshtein function
I'm concerned about the specificity of my corpus, and don't want Google to start suggesting things that have nothing to do with this book. I'm also not sure whether I should try to use both a metaphone comparison and a Levenshtein comparison, or some other combination of techniques to capture both typos and attempts at phonetic spelling.
You might want to consider Apache Solr, which is a web service encapsulation of Lucene, and runs in a J2EE container like Tomcat. You'll get term suggestion, spell check, porting, stemming and much more. It's really very nice.
See here for a full listing of its features relating to queries.
There are Django and PHP libraries for Solr.
I wouldn't recommend using Google Suggest for such a specialised corpus anyway, and with Solr you won't need it.
Hope this helps.

Resources