How can I delete multiple documents in CouchDB? - couchdb

I want to delete all documents where foo equals x. Seems like a pretty basic operation, but I just can't figure it out.
I know how to delete an individual document, but that's not good enough - I may have to delete a few thousand at a time.
How do I bulk delete documents in CouchDB?

I don't know if it's the right way but make a view that exposes the foo field, query the view for the doc._ids of all your documents that you want to delete, and make a bulk update against all your documents. So two (ideally) calls to couch.
http://comments.gmane.org/gmane.comp.db.couchdb.user/11222
Has a similar way to go about it.
Do a bulk update on all the documents you want to delete and update doc._deleted=true following the example in Bulk deletion of documents

It's quite easy with bulk delete: https://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
Just POST to _all_docs a list of JSONs that look like:
{"_id": "0", "_rev": "1-62657917", "_deleted": true}

I also needed something to handle that and, since there was nothing at the time, I decided to make my own implementation.
You can find it here.
Update
Since it was very helpful to me and in order to protect myself from mistakes, I added a backup/restore feature to this tool that can now be found on version 0.2

I tried a somewhat long method to delete documents. I first created a view called map_fun that called the documents i wanted to get deleted. I then iterated through the view and stored the keys of allt he documents and used del db['_id'] to delete them
map_fun = function(doc){
if (doc.doc_type == 'classic'){
emit(doc._id, doc)
}}
deldoclist = []
for row in db.query(map_fun):
deldoclist.append(row.key)
for item in deldoclist:
del db[item]

Related

Deleting all documents in CouchDB

I have a database and I want to truncate all records, I know it is possible to just add a _deleted key to every document or call db.delete() on CouchDB-python library. I am using the delete of couchdb-python but it does not seem to work when I fetch all the documents and then call .delete on each document excluding design documents.
Here is my code.
docs = get_db().view('_all_docs', include_docs=True)
for i in docs:
if not(i['id'].startswith('_')):
get_db().delete(i)
This is the error. Because the result from _all_docs is returning a id instead _id.
File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\site-packages\couchdb\client.py", line 625, in delete
if doc['_id'] is None:
KeyError: '_id'
My question is how do I fetch all documents that returns _id instead of just the id? Or is there any way around this?
In couchdb-python a view query returns a list of couchdb.client.Row objects, not a list of the docs. You need to pass an attribute doc to that delete, i.e. get_db().delete(i['doc']).
From performance perspective, however, it's better to use bulk api. With couchdb-python it should look something like this:
rows = get_db().view('_all_docs', include_docs=True)
docs = []
for row in rows:
if row['id'].startswith('_'):
continue
doc = row['doc']
doc['_deleted'] = True
docs.append(doc)
get_db().update(docs)
Deleting documents from CouchDB you can create in two step:
create a view (with filtering the documents you want to delete)
use the view to delete all documents using the view
I have written a tool for this.

Update a million records in MongoDb each with Subdocument that has a array which also needs to updated

I'm a noobie in Nodejs and MongoDb so please excuse my silly doubts :D but i need help right now
{
"_id": "someid",
"data": "some_data",
"subData": [
{
"_id": "someid",
"data": "some_data"
},
{
"_id": "some_id",
"data": "some_data"
}
]
}
I have a schema like above and imagine i have millions of Documents in that schema, Now i want to update those Documents.
Based on condition i want to select a set of them and modify those "subdata" arrays and update them.
I know there is no way to do that in one query and the issue here at Jira for that feauture but my question now is, what is the most efficient way to update a million records in mongoDb ?
Thanks in advance :)
Going by the schema that you have posted here, it is good that you are maintaining a specific id for the sub document which is automatically added if you are using mongoose (in case the backend is node.js).
I would like to post something from the post that you have posted along with the main post of yours.
It doesn't just not work for updating multiple items in one
array, it also doesn't work for updating a single item in an array for
multiple documents.
So our relevant option there goes out of the window. There is no way to update large chunks in single command as you'll have to target them individually.
If you are going to target them individually it is advisable that you target them using specific unique ids that are being generated and now to automate the whole process you can choose whichever efficient method suits the backend you are using.
You can make several processes in parallel that would help you to attain the desired task in less time but it wont be possible to do everything in one go because mongodb don't support that.
It is also advisable that at place of maintaining several sub documents you should just go for separate collection instead as it'll ease the whole process. Maintain a field to map your two collections.
References
https://jira.mongodb.org/browse/SERVER-831
https://jira.mongodb.org/browse/SERVER-1243
https://www.nodechef.com/docs/cloud-search/updates-documents-nested-in-arrays

Couch db bulk operations

So I've been trying to move data from one database to another. I've already move them but I need to clear the documents which I've already moved from the old database. I've been using ektorp's execute bulk to perform bulk operations. But for some reason I keep getting document update conflict when I try to delete bulk by inserting _deleted.
I might be doing it wrong, here is what I did.
Fetch by bulk with include docs. (For some reason, this doesn't work with just id and rev.)
Then include the _deleted field to each document.
Post using executebulk.
It works for some documents but keeps getting document update conflict for some documents.
Any solution/suggestions please..
This is the preferred way of deleting docs in bulk:
List<Object> bulkDocs = ...
MyClass toBeDeleted = ...
bulkDocs.add(BulkDeleteDocument.of(toBeDeleted));
db.executeBulk(bulkDocs);
If you only need a way to delete/update docs in bulk and you don't need to necessarily implement it in your own software, you can use the great couchapp at:
https://github.com/harthur/costco
You need to upload it to your own server with a couchapp deployment tool, and use a function like
function(doc) {
if(doc.istodelete) // replace this or remove to delete all docs
return null;
}
Read instructions and examples

How to get last created document in couchdb?

How can I get last created document in couchdb? Maybe some how I can use _changes feature of couchdb? But documentation says, that I only can get list of document, ordered by first created document, ant there is no way to change order.
So how can I get last created document?
You can get the changes feed in descending order as it's also a view.
GET /dbname/_changes?descending=true
You can use limit= as well, so;
GET /dbname/_changes?descending=true&limit=1
will give the latest update.
Your only surefire way to get the last created document is to include a timestamp (created_at or something) with your document. From there, you just need a simple view to output all the docs by their creation date.
I was going to suggest using the last_seq information from the database, but the sequence number changes with every single write, and replication also complicates the matter further.

How to get Post with Comments Count in single query with CouchDB?

How to get Post with Comments Count in single query with CouchDB?
I can use map-reduce to build standalone view [{key: post_id, value: comments_count}] but then I had to hit DB twice - one query to get the post, another to get comments_count.
There's also another way (Rails does this) - count comments manually, on the application server and save it in comment_count attribute of the post. But then we need to update the whole post document every time a new comment added or deleted.
It seems to me that CouchDB is not tuned for such a way, unlike RDBMS when we can update only the comment_count attribute in CouchDB we are forced to update the whole post document.
Maybe there's another way to do it?
Thanks.
The view's return json includes the document count as 'total_rows', so you don't need to compute anything yourself, just emit all the documents you want counted.
{"total_rows":3,"offset":0,"rows":[
{"id":...,"key":...,value:doc1},
{"id":...,"key":...,value:doc2},
{"id":...,"key":...,value:doc3}]
}

Resources