How to build a simple search engine using Solr / Lucene?

How to build a simple search engine using Solr / Lucene? - search

I'm totally new to Solr. My professor ask me to build a search engine that can search some causations and conditional texts with Solr.
Now I have already build a core and import some text data into Solr server and can query them on the Solr Admin. But I don't know how to build a search engine that have a web interface like google or other search engine and integrate it with Solr. Please tell me how to do it by step.

You need to pick some programming language and use one of the libraries available for that language to communicate with Solr. In the end, you could write your own library firing HTTP requests again the Solr server, but usually, a library abstracting you of this is already available.
For example, if you want to use PHP, you could use the Solarium library, so could write your web interface using the usual technologies (HTML, CSS, JS) and the backend of your application will be written in PHP (which will communicate with Solr and execute the queries that you specify).
If you want a more direct approach (skip PHP, for example) and you have no problem exposing the Solr server to the public (a really really bad idea, unless you put some sort of proxy between Solr and the user) you could use something like ajax-solr this will allow you to fire Solr queries directly from the browser (similar to how the Solr Admin UI works).

Related

Recommended Approaches for building/designing a search engine for my website

I would like to build a search engine for my website so I can quickly find relevant content. I've done quite a few google searches, discovered ElasticSearch and Solr (which both sit on top of Lucene), and whoosh (python-based).
But are all of these search engines just building an "inverted-index" on top of the data? What are some other algorithmic approaches for getting higher quality searches?
I was intrigued by this blog post using collaborative filtering on top of Solr, which returns related search queries:
http://www.opensourceconnections.com/2013/08/25/semantic-search-with-solr-and-python-numpy/
Are there other common techniques that I should be aware of? Are there other libraries sitting on top of ElasticSearch/Solr that I could just plug into, and use "out-of-the-box"?
Any links or tips would be greatly appreciated!

You haven't mentioned what tech stack you are working on.
If you use Ruby on Rails, I would recommend Tire, which is a gem that gives a DSL wrapper over ElasticSearch. Essentially, it allows you to index your data in Elasticsearch.
For Rails, Sunspot is a very popular gem that people use to interface with Solr.
For .NET - SolrNET is a great Solr client.
Other part of your question (around implementing a good search engine) is too broad - I would recommend reading a good book such as Lucene in Action to get a feel of what Solr/Elasticsearch could do.
I do have a few notes that I wrote a while back, you can read about some of my experience in search here.
Edit:
Since you work on python, I would recommend Haystack, although it is specific to Django. It is very versatile for our needs. However, if you are not using django, I can think of solrpy as a Solr client. Haystack works with both Solr and Elasticsearch.

i suggest you to learn Solr API, cause it was developed since 4 5 years so you can find lots of plug-ins like related search API in Solr, But in elastic search it is very easy to configure however it is very young engine so needs to be developed more.

Pyes is a well-documented Python client for Elasticsearch.
Also, this Youtube video provides a good overview of using Elasticsearch with Python.

I suggest you to use Google Custom Search Engine.
Here have a look.
https://www.google.com/cse/all

We have developed several search engines both on Solr and Elastic. Solr used to be the best as it provided most of the tools needed to admin and debug your indexes. Right now Elastic offers the same features as Solr either natively or via plugins. Plus it is easier to configure in high performance/high availability scenarios (easy to shard or cluster).
Your technology stack is irrelevant. Both Solr and Elastic have clients nearly for every language, plus you can access both via plain HTTP:
That said, each search engine applies to a problem domain. Tunning Elastic or Solr to retrieve relevant results is a bit of an art with some trial and error.
You will have to define analyzers for each field you'll search on and according to your search patterns and the kind of results you will be expecting.
Eventually, to create search engines with a single input that search across disparate attributes of a document type, may need the use of DisMax queries where you can boost results depending on the matching of the search terms to specific document fields.
To summarize: go for Elastic, and get some plugins or frontends. Two suggestions:
Inquisitor: for testing your analyzers
Elastic Head: for administration purposes

What skill set is needed to set up Solr or ElasticSearch?

Two clients of mine are evaluating setting up a search server, either Solr or ElasticSearch. We're wondering what programming languages (if any) and development environments are necessary to get the search servers running. Can it be done by people mostly familiar with front end technologies (HTML/CSS/JavaScript) or is more serious coding skill needed (e.g. understanding of multithreading/ advanced debugging/ other "pro-level" concepts)?
If only light programming skills are needed I'm playing with the thought of suggesting to set it up myself. I have very little Java knowledge but have basic understanding of C, ActionScript, Pascal and even Simula in addition to aforementioned front end technologies. I know basic search architecture from my time in FAST (an enterprise search vendor).
Best, Bjørn

Bit of a broad question but i'll try to give it a shot:
You don't need any programming language in particular. They're both stand alone servers which have API's which are addressable from any programming language.
ElasticSearch has a really nice API that's JSON/REST based.
SOLR's API is a lot more clunky, but also supports XML.
(If I have a choice I tend to go for ElasticSearch, unless there's a really specialized feature I need that's only in SOLR).
Getting up and running doesn't really require any knowledge of any programming language in particular.
The only time you NEED java is when you decide you end up needing custom plugins to SOLR/ElasticSearch itself.
You don't need any specific IDE's beyond those matching your programming language of choice.
When trying to figure out what's going on inside my elasitc search server I do like elastic search HEAD:
http://mobz.github.io/elasticsearch-head/
Hope this helps.

As pointed out already, this is quite a broad question, most likely get closed. But I'll give it a go too.
Both ElasticSearch and Solr are quite easy to get started with. They come as a zip/tar.gz archive that you can extract.
Both require JVM, so you need Java setup.
Once setup, playing with either is quite easy, you do not need any advanced programming skills to play around with it. Solr comes with an Admin UI page, that allows you to execute queries.
Elastic Search has clients as #Constantijin has pointed out. Elastic-head is an excellent choice.
You will need quite a detailed understanding of the Lucene ecosystem, its architecture, plugins etc. Given you have an understanding of another Search Engine, the concepts around indexing and text processing should be easy enough for you.
If you want to write something more advanced than the Admin UI, and you can use Javascript.
You can use AjaxSolr for making ajax requests to your Solr instance
For ElasticSearch, you can try using Elastic.js.

Elasticsearch is an open-source search engine built on top of Apache Lucene™, a full-text search-engine library. Lucene is arguably the most advanced, high-performance, and fully featured search engine library in existence today—both open source and proprietary.
However, Elasticsearch is much more than just Lucene and much more than “just” full-text search. It can also be described as follows:
A distributed real-time document store where every field is indexed and searchable
A distributed search engine with real-time analytics
Capable of scaling to hundreds of servers and petabytes of structured and
unstructured data
I would like add more details regarding how to used ElasticSearch in php language check out - http://www.multidots.com/what-is-elasticsearch

[How to integrate ElasticSearch with PHP?][1]
By using curl, you can use ElasticSearch with your favorite programming language. Here is the example of simple curl request with ElasticSearch.
- PHP Sample Script:
You can find PHP client api on github:
[https://github.com/elastic/elasticsearch-php][2]
Check out Best Article on Elasticsearch - http://www.multidots.com/what-is-elasticsearch

searching and retrieving data using node.js?

I was wondering is node.js good fit for searching massive amount of data, i know its main use is for asynchronous sceanrious like chat, ftp and real time etc. I was thinking of using node.js with mongodb to search 300,000 records of books for the library at my university, and see if it would oppose to using php & mysql. any advice would be great thanks.

Node.js would be a fine application interface for searching data .. but practically, so would PHP or many other languages :).
Your backend data storage solution (MySQL, MongoDB, ..) is a harder choice and really depends on the how you want to index and search the data.
If your main goal is search you probably want to look into a search application based on something like Apache Lucene. These typically use a relational database backend, although some newer efforts like ElasticSearch do have growing community support for ingesting data from sources like MongoDB (ref: MongoDB River Plugin for ElasticSearch).
Since you mentioned book search and libraries, you might also want to look into ILS (Integrated Library Search) applications which may already solve that problem. There are several open source products such as Koha and Evergreen.

Look at MongooseJS
Absolute perfect fit in my opinion.

Zoom Search Engine-like search engine, but for Linux/UNIX

I recently found the Zoom Search Engine, which struck me as quite interesting, since its software allows for easy decoupling of the indexing process and the searching process.
In other words, you run the indexer on your local machine, and then you upload this index plus the PHP files using it to search them to your webserver.
So your webserver doesn't have to do the indexing. I have a host in a shared environment where it's best to use as few resources as possible, so this would be great to me. Moreover, I have a mostly unused small server at home (this is not the webserver I have) that I could use for indexing purposes.
However, it runs Linux, SSH only, so the Zoom Search Engine is not an option.
Is there something that has the same principle as the Zoom Search Engine (index locally, upload index + PHP to website), but available for a command line Linux environment?

My recommendation is to have a look at OpenSearchServer . A lucene based Search Engine. Easy to setup, mature and stable.
For Your requirements :
OpenSearchServer supports Linux and windows platform.
SSH is enough for running OpenSearchServer remotely.
You can crawl the website locally and update the index (The data directory of OpenSearchServer ) to your remote machine through replication or through FTP.For larger index replication is the best option.
It has an PHP client library so that you can easily enable search in your existing or new application.

SPHINX SEARCH SERVER: http://sphinxsearch.com/
Absolutely fulfilling all your needs and also used by some popular shops like Craigslist, MySQL etc.
PHP is very inherent to Sphinx. All the interfaces are in PHP with the actual engine written in C++. Its blazing fast.
I myself use Solr/Lucene but I give Sphinx +1 for your tasks.

Solr / Lucene / Search Hosting

I need some sort of hosted search API for my website where I can submit content and search content with fuzzy logic, where spelling mistakes and grammar won't affect results.
I want to use solr/lucene or whatever technology is out there, without needing to install stuff on my server to reduce setup complexity.
What solr/lucene/othersearch hosting services are there?
I'm read some other posts on stackoverflow, but they are either no longer in business or are wordpress extensions that require server installation (i.e. the processing is done on the server).

You might consider Websolr, of which I am a cofounder, which is exactly the sort of service that you describe.

The thing is, Solr is highly dependant on its datamodel. Or rather how your users search will really affect the way you structure the data model in Solr. As far as I know there aren’t any really good hosting services for Solr yet because you almost always need to do such extensive modifications to the Solr configuration (most notably the schema.xml).
However, with that said, Solr is really easy to get up and running. The example application is bundled with Jetty and runs more or less directly after download.
So unless you have immense scaling issues (read 5-10+ milj documents or a really high query per second load) I’d recommend you to actually install the application on your own server.

Amazon CloudSearch is the best alternate if you do not want to worry about hosting.
http://aws.amazon.com/cloudsearch/
http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/SvcIntro.html

gotosolr - http://gotosolr.com/en
Apache Solr indexes are distributed on 2 hosting companies.
Security is managed by Https and basic http authentication.
Real-time statistics.
Also ready for agencies with multi-accounts and
multi-subscriptions.
Supports Drupal and WPSOLR (https://wordpress.org/plugins/wpsolr-search-engine/)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to build a simple search engine using Solr / Lucene? - search

Related

Recommended Approaches for building/designing a search engine for my website

What skill set is needed to set up Solr or ElasticSearch?

searching and retrieving data using node.js?

Zoom Search Engine-like search engine, but for Linux/UNIX

Solr / Lucene / Search Hosting

Categories

Resources