CouchDB works slow for the first time - couchdb

The problem I am facing with couchDB is whenever I hit any database for the first time, it fetches quite slowly, though the speed is increased from the second time. Is there any workaround we can do so that this glitch gets removed for the first time as well?

Secondary indexes in CouchDB are not updated during document write operations (doc). So the delay is because the view is actually generated for the first time.
For CouchDB 3.x: look into tuning background indexing
For CouchDB 2.x and before: upgrade and/or prefetch your views regularly so they've been built at the moment you need them quickly available
Ah, and if you're doing mango queries, then make sure required indexes are defined in the first place so you're not rescanning the DB every time :)

Related

couchdb request never completes

I am trying to request a large number of documents from my database (which has over 400k documents). I started using _all_docs built-in view. I first tried with this query:
http://database:port/databasename/_all_docs?limit=100&include_docs=true
No problem. Completes as expected. Now to ramp it up:
http://database:port/databasename/_all_docs?limit=1000&include_docs=true
Still fine. Took longer, more data, etc. as expected. Ramp it up again:
http://database:port/databasename/_all_docs?limit=10000&include_docs=true
Request never completes. The Dev tools in chrome show Size = 5.3MB (seems to be significant), and this occurs no matter what value for the limit parameter I use that is over 6500ish. No matter if i specify 6500 or 10,000, it always returns 5.3MB downloaded, and the request stalls.
I have also tried other combinations, such as "skip" and it seems that limit + skip must be < 6500 or I get the same stall.
My environment: Couchdb 1.6.1, Ubuntu 14.04.3 LTS, Azure A1 standard
you have to prewarm your queries, just throwing a 100K or more docs and expecting that you'd get them out of couchdb won't work, it just won't work.
When you ask for some items from a view (in your case Default View), at the first read CouchDB will notice that the B-tree for the view doesn't exist yet, so it goes ahead and builds it on the first read. Depending on how many documents you have in your database, that can take a while, putting a good work load on your database.
On every subsequent read, CouchDB will check if documents have changed since the last write, and throw the changed documents at the map and reduce function. So if you only query some views from time to time, but have lots of changes in between, expect some delays on the next read.
There are 2 ways to handle this situation
1. Pre-warm your view - run a cronjob that does reads to make sure that your view has the B-Tree for this View.
2. Prepare your view in advance for a particular query before inserting the data in the couchdb.
and for now if you really want to read all your docs, don't read them all at once, rather use the skip, limit range queries.

Most efficient way to create couchdb views

My CouchDB view indexes are being created slower than I would like. Writing the documents is not such a problem but the users can edit them offline and then bulk update, which seems to slow things right down.
This answer helped but I was just wondering is it better to separate out various views into different design documents (eg1) or to store them all in one (eg2).
Eg. 1
_design/posts/_view/id
_design/comments/_view/id
_design/tags/_view/id
Eg.2
_design/webresources/_view/_id?key="posts"
_design/webresources/_view/_id?key="comments"
_design/webresources/_view/_id?key="tags"
*This example is just for illustration purposes. I am only concerned with the time it takes to build the indexes.
You will gain better performance if you read often. Couchdb views are updated and build at read time. So you can can read the view every time the document updates to keep it hot*.
Or maybe listen to the changes feed and keep a track of documents updated. Once they reach a certain threshold value read a view.
Another option is use stale parameter.
If stale=ok is set, CouchDB will not refresh the view even if it is stale, the benefit is a an improved query latency. If stale=update_after is set, CouchDB will update the view after the stale result is returned
Every design document is a separate erlang process. So separating your views across different design documents will cause them to be built concurrently. However each view will still be built in a blocking manner. That is the two views across different design documents can start updating at the same time but the time it takes to update the individual views will be the same as if they were in the same design document.
*You don't necessarily have to care about the result. Our goal here is to trick couchdb to update the view. So you can fire off a request in a separate async process and be done with it.

Using Solr with frequently updated data

I have a site search I would like to implement using Solr. Unfortunately, I also have a lot of frequently updated dynamic data in my MySQL database from cron jobs, which I would also like to be searchable.
I would automatically assume that constantly updating records in Solr is not a good idea so is there a workable solution to give me the text-search power of Solr as well as being able to filter based on these frequently updated fields?
I think this depends what "frequently" means and how long your tolerated Solr-lag is.
In my case, i update Solr twice every minute, which works fine.
..based on an MySql DB with some hundred updates a Minute.
In this situation it's important NOT to run an optimize on every Solr update/commit. Better run an optimize every n hoers.
So finally, all the new MySQL stuff will be visible in Solr with max. 30 sec. delay.
It depends on your situation if this is acceptable.

Users last-access time with CouchDB

I am new to CouchDB, but that is not related to the problem. The question is simple, yet not clear to me.
For example: Boris was on the site 5 seconds ago and viewing his profile Ivan sees it.
How to correctly implement this feature (users last-access time)?
The problem is that, if we update users profile document in CouchDB, for ex. property last_access_time, each time a page is refreshed, than we will have the most relevant information (with MySQL we did it this way), but on the other hand, we will have _rev of the document somewhere about 100000++ by the end of the day.
So, how do you do that or do you have any ideas?
This is not a full answer but a possible optimization. It will work in addition to any other answers here.
Instead of storing the latest timestamp, update the timestamp only if it has changed by e.g. 5 seconds, or 60 seconds.
Assume a user refreshes every second for a day. That is 86,400 updates. But if you only update the timestamp at 5-second intervals, that is 17,280; for 60-seconds it is 1,440.
You can do this on the client side. When you want to update the timestamp, fetch the current document and check the old timestamp. If it is less than 5 seconds old, don't do anything. Otherwise, update it normally.
You can also do it on the server side. Write an _update function in CouchDB, which you can query like e.g. POST /db/_design/my_app/_update/last-access/the_doc_id?time=2011-01-31T05:05:31.872Z. The update function will do the same thing: check the old timestamp, and either do nothing, or update it, depending on the elapsed time.
If there was (a large) part of a document that is relatively static, and (a small) part that is highly dynamic, I would consider splitting it into two different documents.
Another option might be to use something more suited to the high write throughput of small pieces of data of that nature such as Redis or possibly MongoDB, and (if necessary) have a background task to occasionally write the info to CouchDB.
CouchDB has no problem with rapid document updates. Just do it, like MySQL. High _rev is no problem.
The only thing is, you have to be responsible about your couch from day 1. All CouchDB users must do this anyway, however you may have to do it sooner. (Applications with few updates have lower risk of a full disk, so developers can postpone this work.)
Poll your database and run compaction if it needs it (based on size, document count, seq_id number)
Poll your views and run compaction too
Always have enough disk capacity and i/o bandwidth to support compaction. Mathematical worst-case: you need 2x the database size, and 2x the write speed; however, most applications require less. Since you are updating documents, not adding them, you will need way less.

Solr for constantly updating index

I have a news site with 150,000 news articles. About 250 new articles are added daily to the database at an interval of 5-15 minutes. I understand that Solr is optimized for millions of records and my 150K won't be a problem for it. But I am worried the frequent updation will be a problem, since the cache gets invalidated with every update. In my dev server, cold load of a page takes 5-7 seconds to load (since every page runs a few MLT queries).
Will it help, if I split my index into two - An archive index and a latest index. The archive index will be updated once every day.
Can anyone suggest any ways to optimize my installation for a constantly updating index?
Thanks
My answer is: test it! Don't try to optimize yet if you don't know how it performs. Like you said, 150K is not a lot, it should be quick to build an index of that size for your tests. After that, run a couple of MLT queries from a different concurrent threads (to simulate users) while you index more documents to see how it behaves.
One setting that you should keep an eye on is auto-commit. Since you are indexing constantly, you can't commit at each document (you will bring Solr down). The value that you will choose for this setting will let you tune the latency of the system (how many times it takes for new documents to be returned in results) while keeping the system responsive.
Consider using mlt=true in the main query instead of issuing per-result MoreLikeThis queries. You'll save the roundtrips and so it will be faster.

Resources