Couchdb View Compaction - couchdb

I understand that compaction of a db removes old revisions beyond the limit set in the config. The result is decreased disk usage, with little to no affect on view speed, because old revisions aren't part of the view index.
I recognize that view compaction is different from view cleanup, which removes unused view index files to save space.
However, what happens with a view compaction? I haven't been able to find much documentation on this, just that it is necessary. Does it operate similarly to db compaction in that it removes old revisions from design docs? If so, I don't think there is much of a benefit as design docs are usually small and few.

Views are structured similarly to databases, so when you make changes to documents there will be old revisions in your view index until you do a compaction, just like a database. The documentation doesn't state this explicitly, but it's implied by the "Views are also need compaction like databases" statement.

Related

CouchDB taking lot of space due to revisions

We have a project that involved with a database sync with pouchdb in mobile devices. We have faced issue when updating multiple documents (8400 docs per minute), internal storage increasing (around 20MB per minute) frequency.
We figured one main reason for that couchdb revisions. So we decided to decrease database rev_limit to around 5. But we heard it may impact replication process between couchdb and pouchdb. My first question is
how this decrease of revision limit impact to the replication process?.
And we figured out views taking more space than normal document storage. My second question, is there any way to reduce couchdb view size?
Your data model (fast updates) doesn't play to CouchDB's strengths. Even after compaction, old revisions (including tombstones) take up space. CouchDB is happiest when using small, immutable documents. Such a model is also less likely to suffer from update conflicts.
Look to your documents - can they be broken apart such that updates can be changed to new document writes? Typical indicators are nested objects or arrays that grow in documents over time.

How does CouchDB compaction affects /db/_changes?

I have an application pulls CouchDB from the first doc to the latest one, batch by batch.
I tried compact my database from 1.7GB to 1.0GB, and /db/_changes seems the same.
Can anyone please clarify if CouchDB compaction affects /db/_changes ?
All compaction does is remove old references to documents in a given database. The changes feed deals exclusively with write operations, which are unaffected by compaction. (since those writes have already happened)
Now, it should be noted that the changes feed will give you the rev numbers as well. Upon compaction, all but the most recent rev are deleted, so those entries in the changes feed will have "dead" links. (so-to-speak)
See the docs for more information about compaction.

Couchdb size management

I have a couch database called restaurants.couch and once a week I would
Compact Database and the file size will go from 20MB to 11MB.
I see that the older versions are gone - which I don't need.
However, I notice that the .restaurants_design folder size is over 100MB
and there are 30+ .view files in here.
I executed Compact Views and Clean Up Views, and it reduced it to 8MB and to just 1 .view file. My app also ran a little faster.
My questions :
1) Why does .restaurants_design get so large ? What's the reason. It's just a view.
2) What's the benefit of Compact Views and Clean Up Views ( besides file size reduction ) ?
3) What are the side effects of Compact Views and Clean Up Views ? Will I ever regret doing this if I don't need versioning ?
4) How often should Compact Database, Compact Views, and Clean Up Views
be performed if the app does not really need versioning other than a simple
nosql database with some views.
CouchDb caches the view query result on disk for indexing to speed to the query. So if you have lots of views, and user query them with different parameters, they will be cached.
Compact Views - View indexes on disk are named after their MD5 hash of the view definition. When you change a view, old indexes remain on disk. To clean up all outdated view indexes (files named after the MD5 representation of views, that does not exist anymore) you can trigger a view cleanup:. Clean Up Views however, remove old views caches, since if you update a view, the old result of the view is still on disk.
The side effects is the next person who query the view will take longer time since CouchDB will need to rebuild the caches. Compact Views and Clean Up Views does not effect versioning, but Compact Database does.
The frequency really depend on the way your application work and the limitation of you deployment setup. However if you really want to disable versioning, try look into _revs_limit setting, so you can just set a limit of revision for all documents.

Replicating CouchDB to local couch reduces size - why?

I recently started using Couch for a large app I'm working on.
I database with 7907 documents, and wanted to rename the database. I poked around for a bit, but couldn't figure out how to rename it, so I figured I would just replicate it to a local database of the name I wanted.
The first time I tried, the replication failed, I believe the error was a timeout. I tried again, and it worked very quickly, which was a little disconcerting.
After the replication, I'm showing that the new database has the correct amount of records, but the database size is about 1/3 of the original.
Also a little odd is that if I refresh futon, the size of the original fluctuates between 94.6 and 95.5 mb
This leaves me with a few questions:
Is the 2nd database storing references to the first? If so, can I delete the first without causing harm?
Why would the size be so different? Had the original built indexes that the new one eventually will?
Why is the size fluctuating?
edit:
A few things that might be helpful:
This is on a cloudant couchdb install
I checked the first and last record of the new db, and they match, so I don't believe futon is underreporting.
Replicating to a new database is similar to compaction. Both involve certain side-effects (incidentally, and intentionally, respectively) which reduce the size of the new .couch file.
The b-tree indexes get balanced
Data from old document revisions is discarded.
Metadata from previous updates to the DB is discarded.
Replications store to/from checkpoints, so if you re-replicate from the same source, to the same location (i.e. re-run a replication that timed out), it will pick up where it left off.
Answers:
Replication does not create a reference to another database. You can delete the first without causing harm.
Replicating (and compacting) generally reduces disk usage. If you have any views in any design documents, those will re-build when you first query them. View indexes use their own .view file which also consumes space.
I am not sure why the size is fluctuating. Browser and proxy caches are the bane of CouchDB (and web) development. But perhaps it is also a result of internal Cloudant behavior (for example, different nodes in the cluster reporting slightly different sizes).

CouchDB .view file growing out of control?

I recently encountered a situation where my CouchDB instance used all available disk space on a 20GB VM instance.
Upon investigation I discovered that a directory in /usr/local/var/lib/couchdb/ contained a bunch of .view files, the largest of which was 16GB. I was able to remove the *.view files to restore normal operation. I'm not sure why the .view files grew so large and how CouchDB manages .view files.
A bit more information. I have a VM running Ubuntu 9.10 (karmic) with 512MB and CouchDB 0.10. The VM has a cron job which invokes a Python script which queries a view. The cron job runs once every five minutes. Every time the view is queried the size of a .view file increases. I've written a job to monitor this on an hourly basis and after a few days I don't see the file rolling over or otherwise decreasing in size.
Does anyone have any insights into this issue? Is there a piece of documentation I've missed? I haven't been able to find anything on the subject but that may be due to looking in the wrong places or my search terms.
CouchDB is very disk hungry, trading disk space for performance. Views will increase in size as items are added to them. You can recover disk space that is no longer needed with cleanup and compaction.
Every time you create update or delete a document then the view indexes will be updated with the relevant changes to the documents. The update to the view will happen when it is queried. So if you are making lots of document changes then you should expect your index to grow and will need to be managed with compaction and cleanup.
If your views are very large for a given set of documents then you may have poorly designed views. Alternatively your design may just require large views and you will need to manage that as you would any other resource.
It would be easier to tell what is happening if you could describe what document updates (inc create and delete) are happening and what your view functions are emitting, especially for the large view.
That your .view files grow, each time you access a view is because CouchDB updates views on access. CouchDB views need compaction like databases too. If you have frequent changes to your documents, resulting in changes in your view, you should run view compaction from time to time. See http://wiki.apache.org/couchdb/HTTP_view_API#View_Compaction
To reduce the size of your views, have a look at the data, you are emitting. When you emit(foo, doc) the entire document is copied to the view to it is very instantly available when you query the view. the function(doc) { emit(doc.title, doc); } will result in a view as big as the database itself. You could also emit(doc.title, nil); and use the include_docs option to let CouchDB fetch the document from the database when you access the view (which will result in a slightly performance penalty). See http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options
Use sequential or monotonic id's for documents instead of random
Yes, couchdb is very disk hungry, and it needs regular compactions. But there is another thing that can help reducing this disk usage, specially sometimes when it's unnecessary.
Couchdb uses B+ trees for storing data/documents which is very good data structure for performance of data retrieval. However use of B-tree trades in performance for disk space usage. With completely random Id, B+-tree fans out quickly. As the minimum fill rate is 1/2 for every internal node, the nodes are mostly filled up to the 1/2 (as the data spreads evenly due to its randomness) generating more internal nodes. Also new insertions can cause a rewrite of full tree. That's what randomness can cause ;)
Instead, use of sequential or monotonic ids can avoid all.
I've had this problem too, trying out CouchDB for a browsed-based game.
We had about 100.000 unexpected visitors on the first day of a site launch, and within 2 days the CouchDB database was taking about 40GB in space. This made the server crash because the HD was completely full.
Compaction brought that back to about 50MB. I also set the _revs_limit (which defaults to 1000) to 10 since we didn't care about revision history, and it's running perfectly since. After almost 1M users, the database size is usually about 2-3GB. When i run compaction it's about 500MB.
Setting document revision limit to 10:
curl -X PUT -d "10" http://dbuser:dbpassword#127.0.0.1:5984/yourdb/_revs_limit
Or without user:password (not recommended):
curl -X PUT -d "10" http://127.0.0.1:5984/yourdb/_revs_limit

Resources