My company is looking at advanced search and reporting solutions, and are considering (among other options) creating something akin to JIRA's JQL for maximum flexibility.
My googling leads me to believe Atlassian built JQL from scratch, at least as a language with syntax and a parser, but I thought I'd try SO before concluding. Anyone know, at a high level, how they did it? Was there one or more Open Source project they based it on?
(Kudos to Atlassian either way - JQL is gorgeous!)
I think they did it from scratch. The underlying architecture is crisp but quite complex. It took me a good few hours to get it, just reading the source and minimal user docs.
~Matt
Atlassian built JQL on top of Apache Lucene. You might want to take a look at Elasticsearch or Solr, which are open source alternatives, also built on Lucene.
I have been using Jira for a year and I notice "Apache Lucene" on the the directory, and before this I got a job wherein I was force to learn apache solr. So in conclusion, Jira is using Apache Lucene as a searching library which is also used was being used in Solr.
for more info read this:
http://www.lucenetutorial.com/lucene-vs-solr.html
Related
I would like to build a search engine for my website so I can quickly find relevant content. I've done quite a few google searches, discovered ElasticSearch and Solr (which both sit on top of Lucene), and whoosh (python-based).
But are all of these search engines just building an "inverted-index" on top of the data? What are some other algorithmic approaches for getting higher quality searches?
I was intrigued by this blog post using collaborative filtering on top of Solr, which returns related search queries:
http://www.opensourceconnections.com/2013/08/25/semantic-search-with-solr-and-python-numpy/
Are there other common techniques that I should be aware of? Are there other libraries sitting on top of ElasticSearch/Solr that I could just plug into, and use "out-of-the-box"?
Any links or tips would be greatly appreciated!
You haven't mentioned what tech stack you are working on.
If you use Ruby on Rails, I would recommend Tire, which is a gem that gives a DSL wrapper over ElasticSearch. Essentially, it allows you to index your data in Elasticsearch.
For Rails, Sunspot is a very popular gem that people use to interface with Solr.
For .NET - SolrNET is a great Solr client.
Other part of your question (around implementing a good search engine) is too broad - I would recommend reading a good book such as Lucene in Action to get a feel of what Solr/Elasticsearch could do.
I do have a few notes that I wrote a while back, you can read about some of my experience in search here.
Edit:
Since you work on python, I would recommend Haystack, although it is specific to Django. It is very versatile for our needs. However, if you are not using django, I can think of solrpy as a Solr client. Haystack works with both Solr and Elasticsearch.
i suggest you to learn Solr API, cause it was developed since 4 5 years so you can find lots of plug-ins like related search API in Solr, But in elastic search it is very easy to configure however it is very young engine so needs to be developed more.
Pyes is a well-documented Python client for Elasticsearch.
Also, this Youtube video provides a good overview of using Elasticsearch with Python.
I suggest you to use Google Custom Search Engine.
Here have a look.
https://www.google.com/cse/all
We have developed several search engines both on Solr and Elastic. Solr used to be the best as it provided most of the tools needed to admin and debug your indexes. Right now Elastic offers the same features as Solr either natively or via plugins. Plus it is easier to configure in high performance/high availability scenarios (easy to shard or cluster).
Your technology stack is irrelevant. Both Solr and Elastic have clients nearly for every language, plus you can access both via plain HTTP:
That said, each search engine applies to a problem domain. Tunning Elastic or Solr to retrieve relevant results is a bit of an art with some trial and error.
You will have to define analyzers for each field you'll search on and according to your search patterns and the kind of results you will be expecting.
Eventually, to create search engines with a single input that search across disparate attributes of a document type, may need the use of DisMax queries where you can boost results depending on the matching of the search terms to specific document fields.
To summarize: go for Elastic, and get some plugins or frontends. Two suggestions:
Inquisitor: for testing your analyzers
Elastic Head: for administration purposes
Two clients of mine are evaluating setting up a search server, either Solr or ElasticSearch. We're wondering what programming languages (if any) and development environments are necessary to get the search servers running. Can it be done by people mostly familiar with front end technologies (HTML/CSS/JavaScript) or is more serious coding skill needed (e.g. understanding of multithreading/ advanced debugging/ other "pro-level" concepts)?
If only light programming skills are needed I'm playing with the thought of suggesting to set it up myself. I have very little Java knowledge but have basic understanding of C, ActionScript, Pascal and even Simula in addition to aforementioned front end technologies. I know basic search architecture from my time in FAST (an enterprise search vendor).
Best, Bjørn
Bit of a broad question but i'll try to give it a shot:
You don't need any programming language in particular. They're both stand alone servers which have API's which are addressable from any programming language.
ElasticSearch has a really nice API that's JSON/REST based.
SOLR's API is a lot more clunky, but also supports XML.
(If I have a choice I tend to go for ElasticSearch, unless there's a really specialized feature I need that's only in SOLR).
Getting up and running doesn't really require any knowledge of any programming language in particular.
The only time you NEED java is when you decide you end up needing custom plugins to SOLR/ElasticSearch itself.
You don't need any specific IDE's beyond those matching your programming language of choice.
When trying to figure out what's going on inside my elasitc search server I do like elastic search HEAD:
http://mobz.github.io/elasticsearch-head/
Hope this helps.
As pointed out already, this is quite a broad question, most likely get closed. But I'll give it a go too.
Both ElasticSearch and Solr are quite easy to get started with. They come as a zip/tar.gz archive that you can extract.
Both require JVM, so you need Java setup.
Once setup, playing with either is quite easy, you do not need any advanced programming skills to play around with it. Solr comes with an Admin UI page, that allows you to execute queries.
Elastic Search has clients as #Constantijin has pointed out. Elastic-head is an excellent choice.
You will need quite a detailed understanding of the Lucene ecosystem, its architecture, plugins etc. Given you have an understanding of another Search Engine, the concepts around indexing and text processing should be easy enough for you.
If you want to write something more advanced than the Admin UI, and you can use Javascript.
You can use AjaxSolr for making ajax requests to your Solr instance
For ElasticSearch, you can try using Elastic.js.
Elasticsearch is an open-source search engine built on top of Apache Lucene™, a full-text search-engine library. Lucene is arguably the most advanced, high-performance, and fully featured search engine library in existence today—both open source and proprietary.
However, Elasticsearch is much more than just Lucene and much more than “just” full-text search. It can also be described as follows:
A distributed real-time document store where every field is indexed and searchable
A distributed search engine with real-time analytics
Capable of scaling to hundreds of servers and petabytes of structured and
unstructured data
I would like add more details regarding how to used ElasticSearch in php language check out - http://www.multidots.com/what-is-elasticsearch
[How to integrate ElasticSearch with PHP?][1]
By using curl, you can use ElasticSearch with your favorite programming language. Here is the example of simple curl request with ElasticSearch.
- PHP Sample Script:
You can find PHP client api on github:
[https://github.com/elastic/elasticsearch-php][2]
Check out Best Article on Elasticsearch - http://www.multidots.com/what-is-elasticsearch
We are working for a client to redesign an existing system which basicaly deals with a lot of files.
The files(more than 5 million) are currently stored on the servers filesystem.The client wants the new system to store the file in S3.
The files also have metadata associated(name,authors name,price ,description etc.).
The search functionality is also to be redesigned.The following are the basic requirements
Full text search should be available on file descriptions.
Filtering should be possible on other attributes of files.
Also , based on the file description, the system should also be able to give recommendation for similar files.
I do not have experience with creating such solution before,so asking for help and suggestion.
I was thinking on the lines of following solutions:
Store the file meta data in MongoDB ,and use the search functionality (http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo)
Use Amazon DynamoDB.It provides api to scan/query the dataset.
Use Lucene/Solr(I havent worked with these yet,I still need to look deeper)
There was this project that I found,that is very similar to what I require
http://www.thriftdb.com - On the home page it says its a datastore with search builtin.
Please let me know if this question should be a community wiki.
Thanks in advance.
You're in luck, announced today:
http://aws.amazon.com/about-aws/whats-new/2012/04/11/aws-announces-cloudsearch/
About searching files and filtering by attributes, the best would be Sphinx Search Engine which is used in filestube (google was using it also years ago).
I dont know if it will work on amazon servers.
Amazon has a custom AMI for Lucene/Solr and we have been happily using it in our projects. Lucene has a powerful indexing capability and executes at exceptional speeds. I would strongly recommend using Apache Lucene/Solr for all your search needs.
I am developing a site using following technologies,
Ruby on Rails,(ruby 1.8.7,rails 2.3.5)
Cassandra 0.6.8,
I want to index the Cassandra Database using Lucandra,
How do I do this?
Is there any RESTful APIs or any web services available for this, so
that I can push the data to index database?
Please share if any ROR example using Lucandra, that really help us to
move forward.
Or Guide me some steps to achieve this.
I am googling for 3 days and I am not getting any examples using
Lucandra in ROR.
Your help will be appreciated in advance
The Solandra project which is replacing Lucandra no longer uses
thrift, only Solr. http://github.com/tjake/Lucandra
This means you can use any of the Solr supported gems like
acts_as_solr
I'm recommending elasticsearch. It has rest api, ruby & rails clients.
https://github.com/angelf/escargot
https://github.com/grantr/rubberband
Elasticsearch is the most advanced free search solution in the world today. It's based on lucene, has High Availability, fault tolerant, partitioned, high performance, scalable, state of art technologhy , open source, more simple than solr... It's success belongs to it's author Shay Banon. He has years of experience as an architect in this field. Solr (and solandra) is nowhere near of it. Simply investigate both, you'll see yourself.
my best
Serdar
We have a web app that allows users to upload documents, create their own documents, and so on. Uploaded files are stored on Amazon S3, created information is stored in a MySQL database. What I'm looking for is some sort of search engine, where I feed it all of our text documents, each with a unique ID, and it builds an index or whatever. Later, I can give it search queries, and it will pull out the best matching documents (via their ID), along with snippets of matching text.
Basically we want to allow our users to search through their repository of uploaded stuffs, along with anything that other users have marked as public. The solution should run on a standard Linux server, and ideally it would be open source, but I'll also consider paid solutions if they aren't outrageously priced.
So far, I've found three potential candidates:
MySQL Full Text Search - some reports I've read are that it's very slow
Apache Lucene - unfortunately written in Java, but I'll use it if I have to. Supposedly fast
Sphinx - doesn't seem to be as popular, ideally whatever solution I find will have lots of community support.
Please let me know if there are any other good choices that I've overlooked, or if you have experience with any of the above.
Take a look at Solr. It's based on Lucene, so it's very fast, and it's really easy to use from any platform.
Sphinx may be worth your consideration, as it works well with several common RDMS (notably MySQL)
There is also Xapian which is fast and is quite customizable.
It has support for custom indexers allowing one to index data that is not stored in a database which might be useful for your documents stored on S3.
I imagine that Google will have a solution that meets your needs. Start here: Google Enterprise
There is a Ruby port of Lucene called "Ferret". In addition to the Ruby API, you can get at the underlying c implementation called "cFerret".
Lucene is very good. And although it was originally written in java there is a php implementation http://framework.zend.com/manual/en/zend.search.lucene.html