How to purge CouchDB documents

How to purge CouchDB documents - couchdb

I need to fully delete, as in Purge, several documents from CouchDB version 2.1.
I have been reading about /db/_purge on docs.couchdb.org, but the process is not clear to me. There is a sentence "The format of the request must include the document ID and one or more revisions that must be purged".
How do I do this in Postman or in a browser? Do I actually enclose my doc _id & rev(s) in braces? I am struggling with how to correctly format a _purge request.

As of now, it is my understanding that _purge does not work in 2.0 and 2.1.
For more information look at this JIRA post.

Note that document purging is only supported in CouchDB versions prior to 2.0, and from 2.3 onward. Early versions of clustered CouchDB (2.0.x and 2.1.x) did not support purging, although this was poorly documented!
The documentation explains, and provides an example:
{
"c6114c65e295552ab1019e2b046b10e": [
"3-b06fcd1c1c9e0ec7c480ee8aa467bf3b",
"3-0e871ef78849b0c206091f1a7af6ec41"
]
}
So that means in the format of:
{
"<doc id>": [
"<rev>",
"<rev>"
]
}
This should be the body of your HTTP request, with a Content-Type of application/json. You won't be able to do that in a browser, without using JavaScript.
With curl it would look like:
curl -X POST http://<server url>/<database>/_purge -H 'Content-Type: application/json' -d '{"<doc id>":["<rev1>","<rev2>"]}'

I just wrote this tool to purge and compact databases in CouchDB.
https://github.com/nisbus/couch_db_cleaner
I hope the documentation in the README is clear and it helps.

Related

Get couchdb views to process design-docs as well as other docs

I am trying to get a view in couchdb to include design docs. I have done it in the past, but can not get it to work today.
In a past couchapp there is a file called options.json that contains the text:
{
"include_design": "true"
}
This results in the design doc containing
"options": {
"include_design": "true"
},
I added this to the new project, but still the design doc is not processed by my views. Is there something that I missed?
CouchDB 1.7.1

According to this documentation, include_design option is a boolean.
I double-checked CouchDB to see how it saves Boolean values by adding a document to a sample database with a Boolean value for one of the keys:
$ cat doc--0000
{"time":"2011", "address":"CT", "include":true}
$ curl -k -X PUT https://admin:**#192.168.1.106:6984/sample/doc--0000 -d #doc--0000
{"ok":true,"id":"doc--0000","rev":"1-e269c17275e2d21ba9100cd65b304d70"}
$ curl -k -X GET https://admin:**#192.168.1.106:6984/sample/doc--0000
{"_id":"doc--0000","_rev":"1-e269c17275e2d21ba9100cd65b304d70","time":"2011","address":"CT","include":true}
The double-check confirms that the Boolean values are saved as true NOT "true". I'm not sure, maybe that's the cause of the issue.

#user3405291 is correct the problem is with the string "true" instead of boolean true. CouchDB doesn't save this. Your view is run on the server as a javascript script so you should write it like you write javascript anywhere.

Possible to replicate couch database with illegal name

I'm using this command to replicate a 100mb database
curl -H 'Content-Type: application/json' \
-X POST http://localhost:5984/_replicate \
-d '{"source": "http://example.com:5984/bad_name_with_underscore", "target": "good_name"}'
I cannot replicate, because CouchDB says the source database name contains illegal chars. I can understand CouchDB folks discourage user to create bad database name, but reading from it is no harm.
I'm not an admin of source CouchDB, so I tried to export database as JSON and then bulk put to new database. But I met {"error":"bad_request","reason":"Missing JSON list of 'docs'"}. Although I have tried to modify the dump.json by changing the structure to {"docs": [...]}.
I'd like to know, is there any other way I can replicate this database with some underscore in name?

I have resolved the problem by using a client - PouchDB. Here is the code.
const PouchDB = require('pouchdb')
const source = new PouchDB("http://example.com:5984/bad_name_with_underscore")
source.replicate.to("http://localhost:5984/good_name")
.on('complete', console.log)
.on('error', console.error)
This works pretty well, so I post this to share with you all.

Using a script to conditionally update a document in Elasticsearch

I have a use case in which concurrent update requests make hit my Elasticsearch cluster. In order to make sure that a stale event (one that is made irrelevant by a newer request) does not update a document after a newer event has already reached the cluster, I would like to pass a script with my update requests to compare a field to determine if the incoming request is relevant or not. The request would look like this:
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '
{
"script": " IF ctx._source.user_update_time > my_new_time THEN do not update ELSE proceed with update",
"params": {
"my_new_time": "2014-09-01T17:36:17.517""
},
"doc": {
"name": "new_name"
},
"doc_as_upsert": true
}'
Is the pseudo code I wrote in the "script" field possible in Elasticsearch ? If so, I would love some help with the syntax (groovy, python or javascript).
Any alternative approach suggestions would be greatly appreciated too.

Elasticsearch has built-in optimistic concurrency control (+ here and here).
The way it works is that the Update API allows you two use the version parameter in order to control whether the update should proceed or not.
So taking your above example, the first index/update operation would create a document with version: 1. Then take the case where you have two concurrent requests. Both components A and B will send an updated document, they initially have both retrieved the document with version: 1 and will specify that version in their request (see version=1 in the query string below). Elasticsearch will update the document if and only if the provided version is the same as the current one
Component A and B both send this, but A's request is the first to make it:
curl -XPOST 'localhost:9200/test/type1/1/_update?version=1' -d '{
"doc": {
"name": "new_name"
},
"doc_as_upsert": true
}'
At this point the version of the document will be 2 and B's request will end up with HTTP 409 Conflict, because B assumed the document was still at version 1, even though the version increased in the meantime due to A's request.
B can definitely retrieve the document with the new version (i.e. 2) and try its update again, but this time with ?version=2in the URL. If it's the first one to reach ES, the update will succeed.

I think the script should be like this:
"script": "if(ctx._source.user_update_time > my_new_time) ctx._source.user_update_time=my_new_time;"
or
"script": "ctx._source.user_update_time > my_new_time ? ctx.op=\"none\" : ctx._source.user_update_time=my_new_time"

I am using river-couchdb with ElasticSearch and would like to reset its last_seq to 0. Anybody know if that's possible?

I am using this a plugin with elasticSearch called river-couchdb to create a full text index of my couchdb. It uses the couchdb _changes api to listen for documents. I assume it is keeping track of the last seq from the _changes api.
Sometimes we rebuild our CouchDB and set our last-seq back to 0. Only way I've found to reset the river-couchdb seq is to delete both its index and the river itself and recreate it. Is there a better way?

As far as I remember, you have a _seq document in your _river index for your river.
This document has a _last_seq entry.
If you want to restart from scratch, I think you can simply delete this document:
curl -XDELETE localhost:9200/_river/yourrivername/_seq
Does it help?

From couchdb-river manual:
starting-at-a-specific-sequence
curl -XDELETE localhost:9200/_river/yourrivername/_seq
curl -XPUT 'localhost:9200/_river/yourrivername/_seq' -d '
{
"couchdb": {
"last_seq": "100"
}
}'
Elasticsearch cant update last_seq without deleting old document

How can you simulate a conflict in CouchDB without using replication?

I'd like to write a unit test for my app that simulates a conflict during replication. Is there a way to simulate a conflict using only a single CouchDB database and server?

I assume you want to get a document containing a conflict in your database, rather than a 409 Conflict response?
So, create a document in the database with a known _id:
$ curl http://localhost:5984/scratch/foo -X PUT -H "Content-Type: application/json" -d '{}'
{"ok":true,"id":"foo","rev":"1-967a00dff5e02add41819138abb3284d"}
Then use the bulk docs API with the all_or_nothing: true option to update the same document with a deliberately bad or no _rev, adding some different document attributes for good measure:
$ curl http://localhost:5984/scratch/_bulk_docs -X POST -H "Content-Type: application/json" -d '{"all_or_nothing": true, "docs": [{"_id": "foo", "abc": 123}]}'
[{"id":"foo","rev":"1-15c813a2b4b312c6915821b01a1986c5"}]
You should then have a conflict in the document:
$ curl http://localhost:5984/scratch/foo?conflicts=true
{"_id":"foo","_rev":"1-967a00dff5e02add41819138abb3284d","_conflicts":["1-15c813a2b4b312c6915821b01a1986c5"]}
You can also perform a normal query with ?new_edits=false as described by CouchDB committer Randall Leeds.
$ curl http://localhost:5984/scratch?new_edits=false -X POST -H "Content-Type: application/json" -d '{"_id": "foo", "abc": 123}'

Googled further after asking the question, and it looks like the answer is to use the all-or-nothing mode of the bulk document API.
http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
Look near the end of the page.

Just post two documents with the same _id attribute. This creates a conflict since the 2nd doc will not contain the proper _rev attribute. Remember, you need to include the latest _rev attribute in each subsequent post so that CouchDB knows you are up to date.
Also, you can create two databases on the same server and replicate between those.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to purge CouchDB documents - couchdb

As of now, it is my understanding that _purge does not work in 2.0 and 2.1. For more information look at this JIRA post.

I just wrote this tool to purge and compact databases in CouchDB. https://github.com/nisbus/couch_db_cleaner I hope the documentation in the README is clear and it helps.

Related

Get couchdb views to process design-docs as well as other docs

Possible to replicate couch database with illegal name

Using a script to conditionally update a document in Elasticsearch

I am using river-couchdb with ElasticSearch and would like to reset its last_seq to 0. Anybody know if that's possible?

How can you simulate a conflict in CouchDB without using replication?

Categories

Resources