Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I came to know about Faceted Search and Lucene search and getting confusion regarding the same.
Any one please give an idea regarding the difference between Faceted Search and Lucene search ,which scenario can apply both serach .
I am working with graphQL ,is there any graphLQ client to provide Faceted search feature ?
Thanks in advance
Faceted search is a kind of search provided by Lucene that searches through a particular dataset. They also provide 'normal' query searching, that searches through all the documents without bias and provides results.
Two good posts that explain faceted search well -
Faceted Search with Solr
Faceted Search - User's Guide
Faceted search is the dynamic clustering of items or search results into categories that let users drill into search results (or even skip searching entirely) by any value in any field. Each facet displayed also shows the number of hits within the search that match that category. Users can then “drill down” by applying specific constraints to the search results. - Lucidworks
Also, check out these examples that are provided by the lucene devs.
If you want to go in-depth into lucene architecture or even as a reference, this is a good paper - Architecture and Implementation of Apache Lucene. See the search section (ie. 2.2.7) for Index Searching. Here is a bit more concept about Lucene's Index Searching -
Lucene is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead. This would be the equivalent of retrieving pages in a book related to a keyword by searching the index at the back of a book, as opposed to searching the words in each page of the book.
This type of index is called an inverted index, because it inverts a page-centric data structure (page->words) to a keyword-centric data structure (word->pages).
Generally Lucene supplies components to search inside the index and to obtain hits on the searched query. QueryParser and indexSearcher are the main components involved in most Lucene based search engines. After the Index have been constructed with postings lists , the search application will retrieve the user query in the index. It first analyzes the user query using the same analyzer as in the indexing process, then transform the user query in to a Query object with respect to the Lucene query language.
Related
Can somebody explain in details on difference between :
Search and filter keyword
I have already gone through https://www.arangodb.com/learn/search/tutorial/ -> SEARCH vs FILTER
Do anybody has any other experience on the difference?
Thanks,
Nilotpal
FILTER corresponds to the WHERE clause in SQL. It does, what the name says. It uses all sorts of arithmetic and AQL operators to filter the search result. It can make use of regular indexes. There is no ranking of filtered results. Filters operate on single collection result sets.
SEARCH offers a full fledged search engine very much like what you would get from regular search engines like Google's page ranking based on a grammar that you could formulate on your own and can operate on multiple collection contents. Its most natural functionality would be a full text search and ranking. In that use it would be a much more powerful version of the full-text index. But it can do much more: normalisation, tokenisation based on language ...
The list goes on and on. Please refer to the documentation of search here:
https://www.arangodb.com/docs/stable/arangosearch.html
I understand the theory concepts of Inverted index and indexes. Primarily, Solr indexes documents using inverted index (Searching tokens instead of documents).
I've also read that Solr uses indexing for features such as facets.
As I understand it, for facets,
searching for a term and creating facets require Solr to search all the terms in a field and match all the retrieved documents containing the search term, which will be costly, so indexing is used.
From what I understand, index is used when all the documents referring to the search terms are retrieved, they are traversed and a count of unique values regarding the fields are calculated.
Is this a correct understanding of this concept or there is something else ?
The is not only one way, how faceting in solr works.
Solr has a heuristic to select a best but there is also a the
facet.method parameter to select it by your own.
Mainly your description is right, but solr is fast because of caching the
UnInvertedField instead of selecting the values for each request from the inverted index.
With DocValues there is also an efficient storage of an uninverted field.
Possible also this answers will help you:
How does Lucene/Solr achieve high performance in multi-field / faceted search?
Solr faceted search performance recommendations
http://de.slideshare.net/lucenerevolution/seeley-solr-facetseurocon2011
Not sure if the tittle is correct for the purpose, but what i want is to be able to search only few documents (and not all) from a lucene index.
Think about it as the following context:
The user wants to search inside a book, which is indexed on lucene, chapter by chapter (every chapter corresponds to a document). The user needs to be able to select the chapters he wants to search in, avoiding irrelevant occurences for his study.
Is that possible to restrict the search to only some documents? or do i have to search ALL index and then filter the results?
Thank you!
Lucene allows you to apply Query Filters, so that you can restrict the results only for those which match the filter criteria.
So basically you can search for chapter:chapter1 and the search will be limited only for chapter one documents
Look at the QueryWrapperFilter. It will let you easily do this kind of thing.
Note however that this is more for ease of coding. This won't really help performance, because in the background, it's effectively searching the entire index, but it makes it easier to code "search within a search." Searching the entire index is not a problem because that's the whole purpose of an index--to make indexed searching extremely fast. This assumes that you have a book ID that is indexed, incidentally. If that is the case, then including the book ID in your search allows for very fast searches of the entire index for that particular book.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Nowadays when starting a web/mobile app project in which search is going to be an important variable. Is it better to go with Lucene from the start or quickly deploy a MySQL based solution and hope for the best?
I had the same decision in November 2010. I'm a friend of mysql and tried to build an search application on mysql first - which works well...
...and fast (i thought it was fast): searching in 200.000 documents (in not more than 2-3 second)
I avoided spending time to lucene/solr, because i would like to use this time for developing the application. And: lucene was new for me... I don't know, if it is good enough, i don't know what it is....
Finally: You can't change the habits of a lifetime.
However, i run in different problems with fuzzy search (which is difficult to implement in mysql) or "more like this" (which have to be coded from scrat in an application using mysql or simple use that "more like this" solr-feature out of the box).
Finally the number of documents rises up to a million and mysql needs now more than 15 seconds to search into the documents.
So i decided to start with lucene and it feels like i opened a door to a new world.
Lot's of features (i hardly coded application-features) are now provided from solr and working out of the box. The fulltext searches are much, much faster: less than 50ms in 1 million Documents, less than 1ms, if it is cached.
So the invested time has paid off.
So if you think about to make an fulltext search: take lucene, if you have mor than a couple of data.
By the way: i'm using an hybrid construct: holding the data in mysql and lucene is only an index with (nearly) no stored data (to keep that index small and fast).
generically speaking, if you are going to have full text searches, you will most surely need lucene or sphinx + mysql (or lucene + mysql, storing the indexable fields in lucene, and returning an id for a mysql row). either of them are excellent choices.
if you are going to do "normal" searches (i.e: integer or char columns or date), mysql partitoning will suffice.
you need to specify what are you going to search for. and how often you will be reindexing your db (if you are going to reindex a lot, i'd go with sphinx)
You are asking whether to go with Lucene or MySQL. But Lucene is a library, and MySQL is a server. You should really be deciding between SOLR search engine and MySQL. In that case, the right answer is likely to be both. Manage all the data in MySQL. Run processes to regularly extract changed data, transform it into SOLR search format, and load it into the search engine. Using SOLR is much more straightforward than using Lucene directly, and if you need to modify the behavior in some way, you can still write plugins for SOLR so there is no loss of flexibility.
But it would be the kiss of death to try and manage data with SOLR. The cycle of read-edit-update works great with SQL dbs but it is not what SOLR is all about. SOLR is fast flexible text search. You can stick image URLs in SOLR for convenience of preparing search results using a non-indexed field.
PROBLEM:
I need to write an advanced search functionality for a website. All the data is stored in MySQL and I'm using Zend Framework on top. I know that I can write a script that takes the search page and builds an SQL query out of it, but this becomes extremely slow if there's a lot of hits. Then I would have to get down to the gritty details of optimizing the database tables/fields/etc. which I'm trying to avoid if possible.
Lucene: I gave Lucene a try, but since it's a full-text search engine, it does not allow any mathematical operators!! So if I wanted to get all the records where field_x > 5, there is no way to do it (correct?)
General Practice? I would like to know how large sites deal with this dilemma. Is there a standard way of doing this that I don't know about, or does everyone have to deal with the nasty details of optimizing the database at some point? I was hoping that some fast indexing/searching technology existed (e.g. Lucene) that would address this problem.
ANY OTHER COMMENTS OR SUGGESTION ARE MOST WELCOME!!
Thanks a lot guys!
Ali
You can use Zend Lucene for textual search, and combine it with MySQL for joins.
Please see Mark Krellenstein's Search Engine vs DBMS paper about the choice; Basically, search engines are better for ranked text search; Databases are better for more complex data manipulations, such as joins, using different record structures.
For a simple x>5 type query, you can use a range query inside Lucene.
Use Lucene for your text-based searches, and use SQL for field_x > 5 searches. I say this because text-based search is hard to get right, and you're probably better off leaving that to an expert.
If you need your users to have the capability of building mathematical expression searches, consider writing an expression builder dialog like this example to collect the search phrase. Then use a parameterized SQL query to execute the search.
SqlWhereBuilder ASP.NET Server Control
http://www.codeproject.com/KB/custom-controls/SqlWhereBuilder.aspx
You can use filters in Lucene to carry out a text search of a reduced set of records. So if you query the database first to get all records where field_x > 5, build a filter (a list of lucene document IDs) and pass this into the lucene search method along with the text query. I'm just learning about this, here's a link to a question I asked (it uses Lucene.Net and C# but it may help) - ignore my question, just check out the accepted answer:
How do you implement a custom filter with Lucene.net?