Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
At the moment we have ASP.Net application with search based on Lucene.Net 3.0.3. And we are going to implement search service to work with > 2.5 mln items and have the similar questions, what search engine will be the quickest in this situation.
As we know Lucene.Net is based on classical Java Lucene principles and ideally it should have almost the same speed. But we found that Lucene.Net 3.0.3 engine have issues with speed of fuzzy search.
We found some explanation why the performance is bad with fuzzy search in Lucene (on our data every request takes 6-8 seconds): Solr/Lucene fuzzy search too slow
Also our speed issues using Lucene.Net 3.0.3 are described here
So we have a list of questions for Lucene community and all experienced IT pro:
Does it makes sense to move from .Net to Java?
Do you see any other alternatives to work with such big amount of data?
Do you have such experience and can you share some numbers according to Lucene fuzzy search? (we had 4-8 sec per search request to 2.5 mln index, see link above to more details)
Do you have experience with FlexLucene? Is it better than Lucene.Net?
Thank you.
I would suggest you to upgrade to a newer version of Lucene, as performance of Fuzzy Search has been improved significantly (by leveraging Finite State Machines).
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm pretty new on search engines and pretty newbie on machine learning. But I wanted to know if there is a way to combine functionalities of search engines like elasticsearch or Apache Solr and machine learning project like Apache Mahout, H2O or PredictionIO.
For exemple, if you work on a travel website where you can search for a destination. You start type "au", so the first suggestions are "AUstria", "AUstralia", "mAUrice island", "mAUritania"... etc... This is typically what elasticsearch can do.
But you know that this user has already travelled on Mauritania three times, so you want that Mauritania goes on the first place of suggestions. And I guess that's typically what machine learning can do.
Is there bridges between this two type of technologies ? Can machine learning ensure the work of search engine efficiently ?
I'm open to all answers, regardless of the technologies used. If you have ever experienced this type of problems, my ears are wide open :-)
Thank you
Your question is very general in nature- so my answer will have to be the same.
Consider a recommender framework such as the one in Apache Mahout correlated co-occurance. Unlike the vanilla spark recommender, this implementation allows for multiple types of actions, such as viewed a web site, booked a trip their before, demographic information, etc.
Now you would calculate the recommendations for each user at whatever interval. Recommendations being based on multiple criteria and what other people similar to this user has done. Consider your 'items' in this case to be every destination in the world. So we now have every possible destination ranked for each user.
It is then a trivial extension to index elastic search by user/the ordered list of that users recommended destinations.
For example, we have a user who has visited Berlin, looked at several hotels in Vienna, and is from Romainia. When the user types in "au", we would expect to see "Austria" come up in the results much higher than 'Austrailia'
Per the comments and down votes- you probably should have either A) asked a more specific programming question or B) asked this question on another forum such as Data Science Stack Exchange, fyi
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
How do I implement caching mechanism of search results as on stackoverflow?
How does elastic search and lucene deal with caching?
As of now, you can cache in two different ways within Elasticsearch
Filter cache - Here if you can offload as many constraints which don't take part in scoring of results, you can have segment level caches for that particular filter alone. This along with warmer API provides some decent amount in memory based caching for the filters applied alone
Shard request cache * - You can cache the results ( Other than hits) on query level. This is pretty new feature and should provide a good amount of caching. But still _source needs to be still taken the shards.
Within Elasticsearch you can exploit these features to attain a good amount of caching.
Also, you can explore other caching option external to Elasticsearch to memcache or other in memory caches.
previously called shared query cache
Warmers
Warmers have been removed. There have been significant improvements to the index that make warmers not necessary anymore.
in ElasticSearch 5.4+
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
We are building a university website and finding a search solution for it. Our university website has high-traffic because it has faculty of open university so very much students (approximately 1.5 million). Even we use caching for speeding up the website. Anyway, which search engine do you suggest for our situation?
Note: We think Solr, Elasticsearch or Sphinx for now, but also it can be one of the others.
Update: We need a full-text search engine which must be fast, extendable and with the features like query likening and indicating priority support.
Thanks.
It really depends on your use-case, what features you want, and whether you have any experience with any of the technologies. I could paraphrase arguments, but there's a very good discussion here: ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage? that covers the pros and cons of each.
Edit (in response to the question's edit):
Of these technologies I have only used Solr (and SQL), but I've found it to be easy to use and would recommend it. It supports native sharding and replication, which should cover the extendibility issue. It also supports things like joins and field weighting, which I think covers all your needs if I read your requirements correctly.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I have heard a comment that using a DDL script for a database installation is anti-agile. Is this true and if so why? I have been looking online for my answers but couldn't find anything.
Thanks
I don't know your exact context, but I would say that DDL is very pro-agile as it supports a repeatable installation process. Maybe they meant that designing the entire database prior to development is anti-agile. I would tend to agree with this assessment, but there is nothing inherently anti-agile about DDL.
Hope that helps!
Brandon
Supporting incremental updates is more convenient and also allows for repeatable builds.
This allows the database to be "refactored" or "evolve" - as changes to it are seen as a series of small adjustments to a base schema.
This allows for constant upgrades without having to manage database versioning explicitly yourself. The database is the source of its own version and the upgrade process only applies appropriate deltas if they are required.
There are several tools that can help with this - the best know being probably ruby on rails active record migrations. For java environments then dbdeploy is pretty good (I think there are versions of dbdeploy for .net and php environments too).
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
My aim is to build an aggregrator of news feeds and blog feeds so as to make
searching/tracking of entitites in it easy. I have been looking at many solutions out there like Terrier, Lucene, SWISH-E, etc.
Basically, I could find only 2 sources of comparison studies done on these engines and one of them is kinda outdated. Basically I want a search engine which would be used in a case in which the data size is not too large, but the indexing will be frequent, every 30 minutes or so. I feel Terrier is not a good tool to be used in this case. It works better when the data size is large and updation frequency is low. Can somebody who has worked in the Information Retrieval field offer some advice ?
Lucene is well known and supported, so personally, that would be my first choice.
If you find a ready-to-use search engine, check out fastcatsearch.
It has been developed for commercial search, and applied to a lot of various sites.
Faster than lucene, and has web-based web manager to use easily.
Hosted in github, and check it out. https://github.com/fastcatgroup/fastcatsearch