Whenever a document is accessed, I would like to add last access time into the document.
How do I update a document in a view whenever there is GET request?
You can't. A GET (when used correctly) does not modify data; CouchDB uses GET correctly.
If you really want to record an access time like this you'll need to update the document with the new timestamp and PUT the document back to CouchDB. However, if more than a few people are accessing a document you're quite likely to get contention over it and get conflict errors from CouchDB.
One option is to create a new "document accessed" document in CouchDB on each access but that would rapidly increase the size of the database. You'd actually have a history of access times if that's useful?
Personally, I would look at simply logging document access to a file or queue and process the file/queue in the background. You could have one "document accessed" document per real document as there's little or no chance of contention and a failed update probably wouldn't really matter (you could always try again anyway).
Related
My CouchDB view indexes are being created slower than I would like. Writing the documents is not such a problem but the users can edit them offline and then bulk update, which seems to slow things right down.
This answer helped but I was just wondering is it better to separate out various views into different design documents (eg1) or to store them all in one (eg2).
Eg. 1
_design/posts/_view/id
_design/comments/_view/id
_design/tags/_view/id
Eg.2
_design/webresources/_view/_id?key="posts"
_design/webresources/_view/_id?key="comments"
_design/webresources/_view/_id?key="tags"
*This example is just for illustration purposes. I am only concerned with the time it takes to build the indexes.
You will gain better performance if you read often. Couchdb views are updated and build at read time. So you can can read the view every time the document updates to keep it hot*.
Or maybe listen to the changes feed and keep a track of documents updated. Once they reach a certain threshold value read a view.
Another option is use stale parameter.
If stale=ok is set, CouchDB will not refresh the view even if it is stale, the benefit is a an improved query latency. If stale=update_after is set, CouchDB will update the view after the stale result is returned
Every design document is a separate erlang process. So separating your views across different design documents will cause them to be built concurrently. However each view will still be built in a blocking manner. That is the two views across different design documents can start updating at the same time but the time it takes to update the individual views will be the same as if they were in the same design document.
*You don't necessarily have to care about the result. Our goal here is to trick couchdb to update the view. So you can fire off a request in a separate async process and be done with it.
In my CouchDB database I'd like all documents to have an 'updated_at' timestamp added when they're changed (and have this enforced).
I can't modify the document with validation functions
updates functions won't run unless they're called specifically (so it'd be possible to update the document and not call the specific update function)
How should I go about implementing this?
There is no way to do this now without triggering _update handlers. This is nice idea to track documents changing time, but it faces problems with replications.
Replications are working on top of public API and this means that:
In case of enforcing such trigger you'll have replications broken since it will be impossible to sync data as it is without document modification. Since document get modified, he receives new revision which may easily lead to dead loop if you replicate data from database A to B and B to A in continuous mode.
In other case when replications are fixed there will be always way to workaround your trigger.
I can suggest one work around - you can create a view which emits a current date as a key (or a part of it):
function( doc ){
emit( new Date, null );
}
This will assign current dates to all documents as soon as the view generation gets triggered (which happens after first request to it) and will reassign new dates on each update of a specific document.
Although the above should solve your issue, I would advice against using it for the reasons already explained by Kxepal: if you're on a replicated network, each node will assign its own dates. So taking this into account, the best I can recommend is to solve the issue on the client side and just post the documents with a date already embedded.
I'm trying to avoid revisions building up in my CouchDB, and also so I can use TouchDB's "bulk pull" for replication (it bulk-pulls on all 1st-revs.) Would it be bad practice to just delete a document, and recreate it rather than modifying it, in order for all documents to stay at rev-1?
Deleting a document in CouchDB, will not reset the _rev.
CouchDB never deletes a document, it simply marks the last revision as deleted. Compaction will delete previous revisions, keeping only the last one. This is needed for replication to work properly. And this is why the deleted revision of a document should not contain any data, but only the _id of the document and the _deleted flag.
The only method to completely remove any traces of deleted documents, is to copy all documents to a new database. But keep in mind the consequences on replication.
well I want to say that your proposal makes me feel dirty, but that wouldn't be an SO answer so..
You mention TouchDB and bulk pull so you have a mobile app with data which can be modified externally and I assume wants to be able to modify it's own data. So the biggest issue I can think of would be update conflict resolution. ie. how do you handle changes to the document on both the client and the server while the client is offline. I think you'll start having to do a lot of the synchronisation work that couch is meant to handle for you..
Is it possible to delete all documents in a couchdb database, except design documents, without creating a specific view for that?
My first approach has been to access the _all_docs standard view, and discard those documents starting with _design. This works but, for large databases, is too slow, since the documents need to be requested from the database (in order to get the document revision) one at a time.
If this is the only valid approach, I think it is much more practical to delete the complete database, and create it from scratch inserting the design documents again.
I can think of a couple of ideas.
Use _all_docs
You do not need to fetch all the documents, only the ID and revisions. By default, that is all that _all_docs returns. You can make a pretty big request in a batch (10k or 100k docs at a time should be fine).
Replicate then delete
You could use an _all_docs query to get the IDs of all design documents.
GET /db/_all_docs?startkey="_design/"&endkey="_design0"
Then replicate them somewhere temporary.
POST /_replicator
{ "source":"db", "target":"db_ddocs", "create_target":true
, "user_ctx": {"roles":["_admin"]}
, "doc_ids": ["_design/ddoc_1", "_design/ddoc_2", "etc..."]
}
Now you can just delete the original database and replicate the temporary one back by swapping the "source" and "target" values.
Deleting vs "deleting"
Note, these are really apples vs. oranges techniques. By deleting a database, you are wiping out the edit history of all its documents. In other words, you cannot replicate those deletion events to any other database. When you "delete" a document in CouchDB, it stores a record of that deletion. If you replicate that database, those deletions will be reflected in the target. (CouchDB stores "tombstones" indicating the document ID, its revision history, and its deleted state.)
That may or may not be important to you. The first idea is probably considered more "correct" however I can see the value of the second. You can visualize the entire program to accomplish this in your head. It's only a few queries and you're done. No looping through _all_docs batches, no headache. Your specific situation will probably make it obvious which is better.
Install couchapp, pull down the design doc to your hard disk, delete the db in futon, push the design doc back up to your recreated database. =)
You could write a shell script that goes through the list of all documents and deletes them all one by one except design docs. Apparently couch-batch can do that. Note that you don't need to fetch the whole docs to do that, just the id and revision.
Other than that, I think filtered replication (or the replication proposed by JasonSmith) is your best bet.
I am evaluating couchdb for a persistent cart functionality. If I create one docuemnt per user and have each cart item as a field, how many items can I store? In current scenario I can have upto 500 items in a cart.
doc-per-cart or a doc-per-item are both fine choices, neither document sounds like it would get very large (JSON encoding/decoding is slower for very large documents and they must be held entirely in memory). On balance, I'd prefer doc-per-item. Of course, you will need to create a (simple) view to display the cart if you went with doc-per-item.
One good reason to prefer doc-per-item is CouchDB's MVCC. Adding an item to a cart will always create a new document, so you will not need to know the current _rev of the item. When a user wants to delete an item, you will have the _id and the _rev and can easily delete it. If you went with doc-per-cart then you will be constantly updating a document, which requires you to have the current _rev all the time.
Note that doc-per-item will allow duplicates in your cart (the user hits Reload and makes two additions instead of one) but as long as the display of the cart shows this, and the final checkout page does too, then I think it's a reasonable failure mode.
A quick review of the CouchDB overview should make it clear that there is no inherent limit on the number of fields in a CouchDB document, and therefore no limit (aside from available memory) to the number of items you can store in your cart.