Lucene or Mysql Full text search [closed] - search

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Nowadays when starting a web/mobile app project in which search is going to be an important variable. Is it better to go with Lucene from the start or quickly deploy a MySQL based solution and hope for the best?

I had the same decision in November 2010. I'm a friend of mysql and tried to build an search application on mysql first - which works well...
...and fast (i thought it was fast): searching in 200.000 documents (in not more than 2-3 second)
I avoided spending time to lucene/solr, because i would like to use this time for developing the application. And: lucene was new for me... I don't know, if it is good enough, i don't know what it is....
Finally: You can't change the habits of a lifetime.
However, i run in different problems with fuzzy search (which is difficult to implement in mysql) or "more like this" (which have to be coded from scrat in an application using mysql or simple use that "more like this" solr-feature out of the box).
Finally the number of documents rises up to a million and mysql needs now more than 15 seconds to search into the documents.
So i decided to start with lucene and it feels like i opened a door to a new world.
Lot's of features (i hardly coded application-features) are now provided from solr and working out of the box. The fulltext searches are much, much faster: less than 50ms in 1 million Documents, less than 1ms, if it is cached.
So the invested time has paid off.
So if you think about to make an fulltext search: take lucene, if you have mor than a couple of data.
By the way: i'm using an hybrid construct: holding the data in mysql and lucene is only an index with (nearly) no stored data (to keep that index small and fast).

generically speaking, if you are going to have full text searches, you will most surely need lucene or sphinx + mysql (or lucene + mysql, storing the indexable fields in lucene, and returning an id for a mysql row). either of them are excellent choices.
if you are going to do "normal" searches (i.e: integer or char columns or date), mysql partitoning will suffice.
you need to specify what are you going to search for. and how often you will be reindexing your db (if you are going to reindex a lot, i'd go with sphinx)

You are asking whether to go with Lucene or MySQL. But Lucene is a library, and MySQL is a server. You should really be deciding between SOLR search engine and MySQL. In that case, the right answer is likely to be both. Manage all the data in MySQL. Run processes to regularly extract changed data, transform it into SOLR search format, and load it into the search engine. Using SOLR is much more straightforward than using Lucene directly, and if you need to modify the behavior in some way, you can still write plugins for SOLR so there is no loss of flexibility.
But it would be the kiss of death to try and manage data with SOLR. The cycle of read-edit-update works great with SQL dbs but it is not what SOLR is all about. SOLR is fast flexible text search. You can stick image URLs in SOLR for convenience of preparing search results using a non-indexed field.

Related

Yii2: How should site-wide search work?

What is the best practice methododology of implementing site-wide search in Yii2?
This question is not about how to implement search specifically, but rather about what kind of approach to use. Should we use Sphinx? Elasticsearch? Or do we use UNION selects to get the data into a DataProvider?
Assume the application is using a relational database to store data. We want to search and display multiple different models. For example, our database contains tables of Books, Authors and Stores. When we search for a keyword we want to display results from all 3 tables (matching Books by title or content, Authors by full name and Stores by name etc).
There are tutorials which show how to use Elasticsearch but assume that our data is stored in the Elasticsearch database, which does not make sense. Our data is already stored in MySQL or PostgreSQL. Does this mean
we need to maintain a duplicate of our data in the Elasticsearch database?
What is the best practice methododology of implementing site-wide search in Yii2?
That depends on many factors, so I cant give you a specific recommendation for your case. Some of the factors to think about are:
What would you like to achieve with this search? Is every little bit in your database a significant search term?
Do you need only full-text-search or a wide range of analytics?
Have you any limits in time or costs?
Can your (tech-)infrastructure handle your ideas?
Is it worth to bring in another extensive technology in the project?
Can you handle additional maintenance tasks to run such a search engine?
And many more ...
In my internal Yii2 Project with a PostgreSQL RDBMS, I decided to use a PostgreSQL Text Search Type called tsvector. Thats good enough for my needs. Why?
You can use Stemming.
Supports Fuzzy search.
Supports basic ranking.
Supports multiple languages.
I highly recommend this blog post Postgres full-text search is Good Enough.

Storing big text data in database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am trying to build a blogging site(sort of). The users can write big blogs(or text) and also have facility for customisation like fonts, size, colour of text etc (kind of like posts in stack overflow n little more). I am looking to use mongo-db or couch-base for the database part. Now I am confused in few things
Where should I store the blogs or posts? In database or in text files? If in database how will I store the fonts, size, colour(user can have different fonts, sizes for different part of posts)?? The posts can sometimes be very big, so is it advisable to store such large texts in database. I see the easier option to store them as files(text files) but I am worried about performance of the site as loading text files can be slow in websites. Just for a knowledge sake, How does google store google docs files??
Should I use any other database which is more suited to handling the kind of things I mentioned?
Though Full search of texts in the post is not a feature I am looking into right now, but might afterwards. So take that also for a small consideration for your answer.
Please help me.
Honestly MongoDB has been the best database for our NodeJS projects. Before it had a 4MB maximum BSON document size, however it was increased to 8 MB and now to 16 MB with the latest versions. This is actually a fair amount of text. According to my calculation you should be able to store 2097152 characters in a 16MB object (though that includes the overhead)
Be aware that you are able to split up text into separate BSON documents very easily using GridFS.
I saw you were entertaining the idea of using flat files. While this may be easy and fast, you will have a hard time indexing the text for later use. MongoDB has the ability to index all your text and implementing search will be a fairly easy feature to add.
MongoDB is pretty fast and I have no doubt it will be the fastest database solution for you. Development in NodeJS + MongoDB has taken months off projects for my firm compared to SQL based databases. Also I have seen some pretty impressive performance reviews for it. Keep in mind as well that those performance reviews were last year and I have seen even more impressive reviews but that was what I could find easily today.

Ideas for full text search MongoDB & node.js [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am developing a search engine for my website and i want to add following features to it;
Full text search
Did you mean feature
Data store in MongoDB
I want to make a restful backend. I will be add data to mongodb manually and it will be indexed (which one i can prefer? Mongodb indexing or some other search indexing libraries like Lucene). I also want to use node.js. These are what i found from my researches. Any idea would be appreciated for the architecture
Thanks in advance
I'm using Node.js / MongoDB / Elasticsearch (based on Lucene). It's an excellent combination. The flow is stunning as well, since all 3 packages (can) deal with JSON as their native format, so no need for transforming DTO's etc.
Have a look:
http://www.elasticsearch.org/
I personally use Sphinx and MongoDb, it is a great pair and I have no problems with it.
I back MongoDB onto a MySQL instance which Sphinx just quickly indexes. Since you should never need to actively index _id, since I have no idea who is gonna know the _id of one of your objects to search for, you can just stash it in MySQL as a string field and it will work just fine.
When I pull the results back out of Sphinx all I do is convert to (in PHP) a new MongoId, or in your case a ObjectId and then simply query on this object id for the rest of the data. It couldn't be simpler, no problems, no hassle, no nothing. And I can spin off the load of reindexing delta indexes off to my MySQL instance keeping my MongoDB instance dealing with what it needs to: serving up tasty data for the user.

Are there any technologies that help develop website search?

PROBLEM:
I need to write an advanced search functionality for a website. All the data is stored in MySQL and I'm using Zend Framework on top. I know that I can write a script that takes the search page and builds an SQL query out of it, but this becomes extremely slow if there's a lot of hits. Then I would have to get down to the gritty details of optimizing the database tables/fields/etc. which I'm trying to avoid if possible.
Lucene: I gave Lucene a try, but since it's a full-text search engine, it does not allow any mathematical operators!! So if I wanted to get all the records where field_x > 5, there is no way to do it (correct?)
General Practice? I would like to know how large sites deal with this dilemma. Is there a standard way of doing this that I don't know about, or does everyone have to deal with the nasty details of optimizing the database at some point? I was hoping that some fast indexing/searching technology existed (e.g. Lucene) that would address this problem.
ANY OTHER COMMENTS OR SUGGESTION ARE MOST WELCOME!!
Thanks a lot guys!
Ali
You can use Zend Lucene for textual search, and combine it with MySQL for joins.
Please see Mark Krellenstein's Search Engine vs DBMS paper about the choice; Basically, search engines are better for ranked text search; Databases are better for more complex data manipulations, such as joins, using different record structures.
For a simple x>5 type query, you can use a range query inside Lucene.
Use Lucene for your text-based searches, and use SQL for field_x > 5 searches. I say this because text-based search is hard to get right, and you're probably better off leaving that to an expert.
If you need your users to have the capability of building mathematical expression searches, consider writing an expression builder dialog like this example to collect the search phrase. Then use a parameterized SQL query to execute the search.
SqlWhereBuilder ASP.NET Server Control
http://www.codeproject.com/KB/custom-controls/SqlWhereBuilder.aspx
You can use filters in Lucene to carry out a text search of a reduced set of records. So if you query the database first to get all records where field_x > 5, build a filter (a list of lucene document IDs) and pass this into the lucene search method along with the text query. I'm just learning about this, here's a link to a question I asked (it uses Lucene.Net and C# but it may help) - ignore my question, just check out the accepted answer:
How do you implement a custom filter with Lucene.net?

What is the best search approach?

I'm using lucene in my project.
Here is my question:
should I use lucene to replace the whole search module which has been implemented with sql using a large number of like statement and accurate search by id or sth,
or should I just use lucene in fuzzy search(i mean full text search)?
Probably you should use lucene, unless the SQL search is very performant.
We are right now moving to Solr (based on Lucene) because our search queries are inherently slow, and cannot be sped up with our database.... If you have reasonably large tables, your search queries will start to get really slow unless the DB has some kind of highly optimized free text search mechanisms.
Thus, let Lucene do what it does best....
I don't think using like statement abusively is a good idea.
And I believe the performance of lucene will be better than database.
I'm actually very impressed by Solr, at work we were looking for a replacement for our Google Mini (it's woefully inadequate for any serious site search) and were expecting something that would take a while to implement. Within 30 minutes of installing Solr we had done what we had expected to take at least a few days and provided us with a far more powerful search interface than we had before.
You could probably use Solr to do quite a lot of clever things beyond a simple site search.

Resources