How can search results be cached? [closed] - search

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
How do I implement caching mechanism of search results as on stackoverflow?
How does elastic search and lucene deal with caching?

As of now, you can cache in two different ways within Elasticsearch
Filter cache - Here if you can offload as many constraints which don't take part in scoring of results, you can have segment level caches for that particular filter alone. This along with warmer API provides some decent amount in memory based caching for the filters applied alone
Shard request cache * - You can cache the results ( Other than hits) on query level. This is pretty new feature and should provide a good amount of caching. But still _source needs to be still taken the shards.
Within Elasticsearch you can exploit these features to attain a good amount of caching.
Also, you can explore other caching option external to Elasticsearch to memcache or other in memory caches.
previously called shared query cache

Warmers
Warmers have been removed. There have been significant improvements to the index that make warmers not necessary anymore.
in ElasticSearch 5.4+

Related

I have more than 3k rows of data to retrieve from cassandra using an api. I have indexing on it but its causing issue of connection reset [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 14 days ago.
Improve this question
I have more than 3k rows of data to retrieve from cassandra using an api. I have indexing on it but then also its causing issue of connection reset.
Should I look for any other data base to do so?
Is it possible to have a work around in cassandra?
Will providing limit or filter on between dates in query will help?
(so there will be restriction on api, is it standard practice)
So there's a lot missing here that is needed to help diagnose what is going on. Specifically, it'd be great to see the underlying table definition and the actual CQL query that the API is trying to run.
Without that, I can say that to me, it sounds like the API is trying to aggregate the 3000 rows from multiple partitions with a specific date range in the cluster (and is probably using the ALLOW FILTERING directive to accomplish this). Most multi-partition queries will time-out, just because of all the extra network time being introduced while polling each node in the cluster.
As with all queries in Cassandra, a table needs to be built to support a specific query. If it's not, this is generally what happens.
Will providing limit or filter on between dates in query will help?
Yes, breaking this query up into smaller pieces will help. If you can look at the underlying table definition, that might give you a clue as to the right way to properly query the table. But in this case, making 10 queries for 300 rows probably has a higher chance for success than 1 query for 3000 rows.

ArangoDB, what's the better way to peform queries? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
What's better to retrieve complex data from ArangoDB: A big query with all collection joins and graph traversal or multiple queries for each piece of data?
I think it depends on several aspects, e.g. the operation(s) you want to perform, scenario in which the querie(s) should be executed or if you favor performance over maintainability.
AQL provides the ability to write a single non-trivial query which might span through entire dataset and perform complex operation(s). Dissolving a big query into multiple smaller ones might improve maintainability and code readability, but on the other hand separate queries for each piece of data might have negative performance impact in the form of network latency associated with each request. One should also consider if the scenario allows to work with partial results returned from database while the other batch of queries is being processed.

The best search solution for a university website which has high-traffic time to time [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
We are building a university website and finding a search solution for it. Our university website has high-traffic because it has faculty of open university so very much students (approximately 1.5 million). Even we use caching for speeding up the website. Anyway, which search engine do you suggest for our situation?
Note: We think Solr, Elasticsearch or Sphinx for now, but also it can be one of the others.
Update: We need a full-text search engine which must be fast, extendable and with the features like query likening and indicating priority support.
Thanks.
It really depends on your use-case, what features you want, and whether you have any experience with any of the technologies. I could paraphrase arguments, but there's a very good discussion here: ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage? that covers the pros and cons of each.
Edit (in response to the question's edit):
Of these technologies I have only used Solr (and SQL), but I've found it to be easy to use and would recommend it. It supports native sharding and replication, which should cover the extendibility issue. It also supports things like joins and field weighting, which I think covers all your needs if I read your requirements correctly.

How slow is a call to a local database? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
In general, say you have a (<16mb) table in a database running on the same machine as your server. If you need to do lots of querying into this table (%100 reads), is it better to:
Get the entire table, and do all the searching/querying/ in the server code.
Make lots of queries into the local database.
If the database is local, can I take advantage of the dbms's highly-efficient internal data structures for querying, or is the delay such that it's faster to map the tables returned by the database into my own data structures.
Thanks.
This is going to depend heavily on what kind of searches you're doing.
If your data is all ID lookups, it's probably faster to have it in RAM.
If your data is all full scans (no indexes), it's probably faster to have it in RAM.
If your data uses indexes, it's probably faster to have it in the DB.
Of course, much of the appeal of a database is indexes and a common query interface, so you have to weigh how valuable those are versus raw speed.
There's no way to really answer this without knowing exactly the nature of the data and queries to be done on it. Over-the-wire time has its cost, as does BSON <-> native marshalling, but indexed searches can be O(log n) as opposed to a dumb O(n) (or worse) search over a simple in-memory data structure.
Have you tried benchmarking?

NoSQL - what's the best option for IP tracking? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm trying to implement a system using node.js in which a number of sites would contain js loaded from a common host, and trigger an action when some user visits n+ sites.
I suppose a nosql solution storing a mapping of ip address => array of sites visited would be preferable to a RDBM both in terms of performance and simplicity. The actions I need are "add to array if not there already" and getting the length of the array. Also, I wouldn't like it all to sit in memory all the time, since the db might get large some day.
What's a system that fits these requirements the best? MongoDB seems like a nice option given $addToSet exists, but maybe there's something better in terms of RAM usage?
When I hear about working with lists or sets, the first choice is Redis

Resources