I am trying to learn about lucene to build a robust search mechanism for mean stack application
But i have understood the conceptual part Apache Solr from these series of videos
https://www.youtube.com/watch?v=Zh_aYQkG0Wc&index=3&list=PLJbE6j2EG1pZ7YfU05bCqdv5bDkKo75nQ
but not sure how start implement it
As most of the source i have refereed use java and i have few confusions like
*what is lucene and solr ie they both mean the same
*if want to build a search mechanism how should i use them
*Do we have a npm module which will help in performing the search
can anybody please refer some source which will help to do search for mongodb nodejs and lucene stack are how to build a module (mechanism) which will help do to perform search on mongodb collections
what is lucene and solr ie they both mean the same
Lucene is a search library built in Java. Solr is a web application built on top of Lucene. Simply Solr = Lucene + Added features.
Lucene:
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search.
Solr:
Solr major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. This site is powered by Solr.
Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs.
Mongo and Lucene
https://github.com/rstiller/mongo-lucene
https://www.jayway.com/2010/11/14/full-text-search-with-mongodb-and-lucene-analyzers/
Mongo and Solr
solr Data Import Handlers for MongoDB
How to import and index mongodb data in solr 4
What would be the motivation to integrate mongodb with solr
http://blog.mongodb.org/post/29127828146/introducing-mongo-connector
https://github.com/mongodb-labs/mongo-connector
Hope it Helps!
Related
I am Japanese and I am using a translator so sorry if my English is strange.
I am working on a website for my job and I am looking for a way to migrate the search function from Lucene to Solr.
Is there any software or other software out there that would make this possible? (Is it distributed on official websites?)
And if not, what means are available? Please let us know how to do this. If you have a similar answer, please provide a link to it.
You cannot migrate Lucene to Solr as Solr is built on top of Lucene.
Lucene is the core search engine library, Solr is the search server.
Apache Lucene is the base version for the search engine while Apache Solr is inherited Lucene with new inbuilt features which Apache Lucene don't provide out of the box.
There are no direct option or tools available for the same.
There are many option available for indexing data to solr.
It all depends on where is your data. like if you have data in database one way is to use DataImportHandler of solr to index data in solr.
Please refer the documentation for more options here
I want to use Stormcrawler with an RDBMS engines like Oracle, MySQL, or Postgres. But in the storm-crawler-sql module, we only have a SqlSpout and a StatusUpdaterBolt. We did not find any class for indexing crawl results to the SQL database. Is there any technical reason behind this?
What's wrong with the IndexerBolt?
I'm totally new to Solr. My professor ask me to build a search engine that can search some causations and conditional texts with Solr.
Now I have already build a core and import some text data into Solr server and can query them on the Solr Admin. But I don't know how to build a search engine that have a web interface like google or other search engine and integrate it with Solr. Please tell me how to do it by step.
You need to pick some programming language and use one of the libraries available for that language to communicate with Solr. In the end, you could write your own library firing HTTP requests again the Solr server, but usually, a library abstracting you of this is already available.
For example, if you want to use PHP, you could use the Solarium library, so could write your web interface using the usual technologies (HTML, CSS, JS) and the backend of your application will be written in PHP (which will communicate with Solr and execute the queries that you specify).
If you want a more direct approach (skip PHP, for example) and you have no problem exposing the Solr server to the public (a really really bad idea, unless you put some sort of proxy between Solr and the user) you could use something like ajax-solr this will allow you to fire Solr queries directly from the browser (similar to how the Solr Admin UI works).
I would like to build a search engine for my website so I can quickly find relevant content. I've done quite a few google searches, discovered ElasticSearch and Solr (which both sit on top of Lucene), and whoosh (python-based).
But are all of these search engines just building an "inverted-index" on top of the data? What are some other algorithmic approaches for getting higher quality searches?
I was intrigued by this blog post using collaborative filtering on top of Solr, which returns related search queries:
http://www.opensourceconnections.com/2013/08/25/semantic-search-with-solr-and-python-numpy/
Are there other common techniques that I should be aware of? Are there other libraries sitting on top of ElasticSearch/Solr that I could just plug into, and use "out-of-the-box"?
Any links or tips would be greatly appreciated!
You haven't mentioned what tech stack you are working on.
If you use Ruby on Rails, I would recommend Tire, which is a gem that gives a DSL wrapper over ElasticSearch. Essentially, it allows you to index your data in Elasticsearch.
For Rails, Sunspot is a very popular gem that people use to interface with Solr.
For .NET - SolrNET is a great Solr client.
Other part of your question (around implementing a good search engine) is too broad - I would recommend reading a good book such as Lucene in Action to get a feel of what Solr/Elasticsearch could do.
I do have a few notes that I wrote a while back, you can read about some of my experience in search here.
Edit:
Since you work on python, I would recommend Haystack, although it is specific to Django. It is very versatile for our needs. However, if you are not using django, I can think of solrpy as a Solr client. Haystack works with both Solr and Elasticsearch.
i suggest you to learn Solr API, cause it was developed since 4 5 years so you can find lots of plug-ins like related search API in Solr, But in elastic search it is very easy to configure however it is very young engine so needs to be developed more.
Pyes is a well-documented Python client for Elasticsearch.
Also, this Youtube video provides a good overview of using Elasticsearch with Python.
I suggest you to use Google Custom Search Engine.
Here have a look.
https://www.google.com/cse/all
We have developed several search engines both on Solr and Elastic. Solr used to be the best as it provided most of the tools needed to admin and debug your indexes. Right now Elastic offers the same features as Solr either natively or via plugins. Plus it is easier to configure in high performance/high availability scenarios (easy to shard or cluster).
Your technology stack is irrelevant. Both Solr and Elastic have clients nearly for every language, plus you can access both via plain HTTP:
That said, each search engine applies to a problem domain. Tunning Elastic or Solr to retrieve relevant results is a bit of an art with some trial and error.
You will have to define analyzers for each field you'll search on and according to your search patterns and the kind of results you will be expecting.
Eventually, to create search engines with a single input that search across disparate attributes of a document type, may need the use of DisMax queries where you can boost results depending on the matching of the search terms to specific document fields.
To summarize: go for Elastic, and get some plugins or frontends. Two suggestions:
Inquisitor: for testing your analyzers
Elastic Head: for administration purposes
Are there any NoSQL databases that support word proximity searching similar to lucene?
I have a client that would like the flexibility of NoSQL with the search power of a Lucene or some other search tool. The average amount of data to be searched is 200GB
Take a look at tjake's Solandra (former Lucandra). "Solandra is a real-time distributed search engine built on Apache Solr and Apache Cassandra."
Solandra "supports most out-of-the-box Solr functionality (search, faceting, highlights)"
If you can manage a .NET/Win solution also check out RavenDB - has lucene baked into it. If not, Schild's answer is a good one. You can also use lucene separately with MongoDB but your app would have to maintain the index itself...
Lucene is a NoSQL database.
Probably too late to be useful but check out MarkLogic. It's a document database with integrated full-text search (not bolt-on Lucene). You can see a quick demo via http://developer.marklogic.com/try/corona/index