NoSQL Word Proximity - search

Are there any NoSQL databases that support word proximity searching similar to lucene?
I have a client that would like the flexibility of NoSQL with the search power of a Lucene or some other search tool. The average amount of data to be searched is 200GB

Take a look at tjake's Solandra (former Lucandra). "Solandra is a real-time distributed search engine built on Apache Solr and Apache Cassandra."
Solandra "supports most out-of-the-box Solr functionality (search, faceting, highlights)"

If you can manage a .NET/Win solution also check out RavenDB - has lucene baked into it. If not, Schild's answer is a good one. You can also use lucene separately with MongoDB but your app would have to maintain the index itself...

Lucene is a NoSQL database.

Probably too late to be useful but check out MarkLogic. It's a document database with integrated full-text search (not bolt-on Lucene). You can see a quick demo via http://developer.marklogic.com/try/corona/index

Related

How to Migrating Lucene to Solr

I am Japanese and I am using a translator so sorry if my English is strange.
I am working on a website for my job and I am looking for a way to migrate the search function from Lucene to Solr.
Is there any software or other software out there that would make this possible? (Is it distributed on official websites?)
And if not, what means are available? Please let us know how to do this. If you have a similar answer, please provide a link to it.
You cannot migrate Lucene to Solr as Solr is built on top of Lucene.
Lucene is the core search engine library, Solr is the search server.
Apache Lucene is the base version for the search engine while Apache Solr is inherited Lucene with new inbuilt features which Apache Lucene don't provide out of the box.
There are no direct option or tools available for the same.
There are many option available for indexing data to solr.
It all depends on where is your data. like if you have data in database one way is to use DataImportHandler of solr to index data in solr.
Please refer the documentation for more options here

trying to use lucene search with nodejs and mongodb

I am trying to learn about lucene to build a robust search mechanism for mean stack application
But i have understood the conceptual part Apache Solr from these series of videos
https://www.youtube.com/watch?v=Zh_aYQkG0Wc&index=3&list=PLJbE6j2EG1pZ7YfU05bCqdv5bDkKo75nQ
but not sure how start implement it
As most of the source i have refereed use java and i have few confusions like
*what is lucene and solr ie they both mean the same
*if want to build a search mechanism how should i use them
*Do we have a npm module which will help in performing the search
can anybody please refer some source which will help to do search for mongodb nodejs and lucene stack are how to build a module (mechanism) which will help do to perform search on mongodb collections
what is lucene and solr ie they both mean the same
Lucene is a search library built in Java. Solr is a web application built on top of Lucene. Simply Solr = Lucene + Added features.
Lucene:
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search.
Solr:
Solr major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. This site is powered by Solr.
Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs.
Mongo and Lucene
https://github.com/rstiller/mongo-lucene
https://www.jayway.com/2010/11/14/full-text-search-with-mongodb-and-lucene-analyzers/
Mongo and Solr
solr Data Import Handlers for MongoDB
How to import and index mongodb data in solr 4
What would be the motivation to integrate mongodb with solr
http://blog.mongodb.org/post/29127828146/introducing-mongo-connector
https://github.com/mongodb-labs/mongo-connector
Hope it Helps!

Recommended Approaches for building/designing a search engine for my website

I would like to build a search engine for my website so I can quickly find relevant content. I've done quite a few google searches, discovered ElasticSearch and Solr (which both sit on top of Lucene), and whoosh (python-based).
But are all of these search engines just building an "inverted-index" on top of the data? What are some other algorithmic approaches for getting higher quality searches?
I was intrigued by this blog post using collaborative filtering on top of Solr, which returns related search queries:
http://www.opensourceconnections.com/2013/08/25/semantic-search-with-solr-and-python-numpy/
Are there other common techniques that I should be aware of? Are there other libraries sitting on top of ElasticSearch/Solr that I could just plug into, and use "out-of-the-box"?
Any links or tips would be greatly appreciated!
You haven't mentioned what tech stack you are working on.
If you use Ruby on Rails, I would recommend Tire, which is a gem that gives a DSL wrapper over ElasticSearch. Essentially, it allows you to index your data in Elasticsearch.
For Rails, Sunspot is a very popular gem that people use to interface with Solr.
For .NET - SolrNET is a great Solr client.
Other part of your question (around implementing a good search engine) is too broad - I would recommend reading a good book such as Lucene in Action to get a feel of what Solr/Elasticsearch could do.
I do have a few notes that I wrote a while back, you can read about some of my experience in search here.
Edit:
Since you work on python, I would recommend Haystack, although it is specific to Django. It is very versatile for our needs. However, if you are not using django, I can think of solrpy as a Solr client. Haystack works with both Solr and Elasticsearch.
i suggest you to learn Solr API, cause it was developed since 4 5 years so you can find lots of plug-ins like related search API in Solr, But in elastic search it is very easy to configure however it is very young engine so needs to be developed more.
Pyes is a well-documented Python client for Elasticsearch.
Also, this Youtube video provides a good overview of using Elasticsearch with Python.
I suggest you to use Google Custom Search Engine.
Here have a look.
https://www.google.com/cse/all
We have developed several search engines both on Solr and Elastic. Solr used to be the best as it provided most of the tools needed to admin and debug your indexes. Right now Elastic offers the same features as Solr either natively or via plugins. Plus it is easier to configure in high performance/high availability scenarios (easy to shard or cluster).
Your technology stack is irrelevant. Both Solr and Elastic have clients nearly for every language, plus you can access both via plain HTTP:
That said, each search engine applies to a problem domain. Tunning Elastic or Solr to retrieve relevant results is a bit of an art with some trial and error.
You will have to define analyzers for each field you'll search on and according to your search patterns and the kind of results you will be expecting.
Eventually, to create search engines with a single input that search across disparate attributes of a document type, may need the use of DisMax queries where you can boost results depending on the matching of the search terms to specific document fields.
To summarize: go for Elastic, and get some plugins or frontends. Two suggestions:
Inquisitor: for testing your analyzers
Elastic Head: for administration purposes

dot net version of Solr

We are going to use Solr as our search server, but all of our web interface is in ASP, and our data is an MS SQL Server database. What is the best solution? Shall we use Java-based or Dot Net-based version of Solr?
I'm not aware of any Solr port for .net, you can have a look at this question to know more. I would use the original Solr written in Java, and a client library written in the language you prefer, for example solrnet, to communicate with it.
Check out this: Lucene.net
You can use Solr.Net client or you can create your own client for Lucene.Net like this. There is no Solr created in .Net available but you can create one since the engine is available in .Net.
Best suitable is solrnet which is up to date with all latest versions of Solr and much mature (minimum issues).

Whats the best deployment for "like" search in MVC/Azure

I use MVC3 on Azure, I like to have a "like" kind of search,
e.g. http://msdn.microsoft.com/en-us/library/ms179859.aspx
First question: Does Lucene support "like" search, I tried ask this question on Google, but it's very difficult to search the word "like" without get result like: I like to use Lucene :)
Second: What kind of performance can I get for use SQL Azure for "like" search, with only id(int) as key, and text(string(100)) for "like" search, and rows around 10 million. I tried seems cannot work out, always timeout. Or you can answer the question as: I know theres a way to improve "like" search in SQL Azure.
3rd question: Is there any other product thats works well with Azure Platform can support "like" search with reasonable performance(less than 2 seconds for above sample database)
Thanks.
SQL Azure doesn't support full text indexing so 'LIKE' is limited to the ANSI SQL operator. This is wholly inadequate for general searching. In general, on the cloud (Azure) you want to avoid using SQL for searching anyway - is is the wrong place for it from a scalability point of view.
As you suggest, a lucene-based search engine is the way to go, but I would recommend using Solr (the Apache/Java lucene server). Solr can still be hosted in Azure and you will find a lot more community support, documentation and help for it.
Lucene does support LIKE search and there is a library specific for Lucene.NET that leverages Azure Storage for the Lucene index. This allows you to provide a fault tolerant Lucene index that will scale well in the cloud.
http://code.msdn.microsoft.com/windowsazure/Azure-Library-for-83562538
Solr is a good option, but you will have to manage the storage of the index yourself unless you extend Solr to run on Azure storage yourself.
You may want to look into implementing Solr on Azure. There's a good write up with demo's and tutorials here:
http://wiki.apache.org/solr/SolrOnWindowsAzure

Resources