Listing active replications in couchdb 1.1.0 - couchdb

I am bootstraping replications in couchdb by POSTing to localhost:5984/_replicate. This URL only accepts POST requests.
There is also a second URL: localhost:5984/_replicator, which accepts PUT, GET and DELETE requests.
When I configure a replication POSTing to _replicate, it gets started, but I can not get information about it. It is also not listed in _replicator.
How can I get the list of active replications?
How can I cancel an active replication?
Edit: how to trigger replications with the _replicator method.
Thanks to comments by JasonSmith, I got to the following solution: PUTting to _replicator requires using full url (including authentictation credentials) for the target database. This is not the case when using the _replicate url, which is happy getting just the name of the target database (I am talking here about pull replications). The reason, as far as I can tell, is explained here (see section 8, "The user_ctx property and delegations")

The original API was a special URL, /_replicate where you tell Couch what to do and it tells you the result. However, the newer system is a regular database, called /_replicator and you create documents inside it telling Couch what to do. The document format is the same as the older _replicate format, however CouchDB will update the document as the replication proceeds. (For example, it will add a field "state":"triggered" or "state":"complete", etc.)
To get a list of active replications, GET /_active_tasks as the server admin. For example (formatted):
curl http://admin:secret#localhost:5984/_active_tasks
[ { "type": "Replication"
, "task": "`1bea06f0596c0fe6a1371af473a95aea+create_target`: `http://jhs.iriscouch.com/iris/` -> `iris`"
, "started_on": 1315877897
, "updated_on": 1315877898
, "status": "Processed 83 / 119 changes"
, "pid": "<0.224.0>"
}
, { "type": "Replication"
, // ... etc ...
}
]
The wiki has instructions to cancel CouchDB replication. Basically, you want to specify the same source and target and also add "cancel":true.

Related

ExpressJS: How to cache on demand

I'm trying to build a REST API with express, sequelize (PostgreSQL dialect) and node.
Essentially I have two endpoints:
Method
Endpoint
Desc.
GET
/api/players
To get players info, including assets
POST
/api/assets
To create an asset
And there is a mechanism which updates a property (say price) of assets, over a cycle of 30 seconds.
Goal
I want to cache the results of GET /api/players, but I want some control over it, so that whenever a user creates an asset (using POST /api/assets) and right after that a request to GET /api/players should give the updated data (i.e. including the property which updates for every 30 seconds) and cache it until it gets updated in the next cycle.
Expected
The following should demonstrate it:
GET /api/players
JSON Response:
[
{
"name": "John Doe"
"assets": [
{
"id":1
"price": 10
}
]
}
]
POST /api/assets
JSON Request:
{
"id":2
}
GET /api/players
JSON Response:
{
"name": "John Doe"
"assets": [
{
"id":1
"price": 10
},
{
"id":2
"price": 7.99
}
]
}
What I have managed to do so far
I have made the routes, but GET /api/players has no cache mechanism and basically queries the database every time it is requested.
Some solutions I have found, but none seem to meet my scenario
apicache (https://www.youtube.com/watch?v=ZGymN8aFsv4&t=1360s): But I don't have a specific duration, because a user can create an asset anytime.
Example implementation
I have seen (kind off) similar implementation (that I desire) in Github actions workflow for implementing cache, where you define a key and unless the key has changed it uses the same packages and doesn't install packages everytime, (example: https://github.com/python-discord/quackstack/blob/6792fd5868f28573bb8f9565977df84e7ba50f42/.github/workflows/quackstack.yml#L39-L52)
Is there any package, to do that? So that while processing POST /api/assets I can change the key in its handler, and thus GET /api/players gives me the updated result (also I can change the key in that 30 seconds cycle too), and after that it gives me the cached result (until it is updated in the next cycle).
Note: If you have a solution please try to stick with some npm packages, rather than something like redis, unless its the only/best solution.
Thanks in advance!
(P.S. I'm a beginner and this is my first question in SO)
Typically caching is done with help of Redis. Redis is in-memory key-value store. You could handle the cache in the following manner.
In your handler for POST operation update/reset cached entry for players.
In your handler for GET operation if the Redis has the entry in cache return it, otherwise do the logic query the data, add the entry to the cache and return the data.
Alternatively, you could use Memcached.
A bit late to this answer but I was looking for a similar solution. I found that the apicache library not only allows for caching for specified durations, but the cache can also be manually cleared.
apicache.clear([target]) - clears cache target (key or group), or entire cache if no value passed, returns new index.
Here is an example for your implementation:
// POST /api/assets
app.post('/api/assets', function(req, res, next) {
// update assets then clear cache
apicache.clear()
// or only clear the specific players cache by using a parameter
// apicache.clear('players')
res.send(response)
})

Using a script to conditionally update a document in Elasticsearch

I have a use case in which concurrent update requests make hit my Elasticsearch cluster. In order to make sure that a stale event (one that is made irrelevant by a newer request) does not update a document after a newer event has already reached the cluster, I would like to pass a script with my update requests to compare a field to determine if the incoming request is relevant or not. The request would look like this:
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '
{
"script": " IF ctx._source.user_update_time > my_new_time THEN do not update ELSE proceed with update",
"params": {
"my_new_time": "2014-09-01T17:36:17.517""
},
"doc": {
"name": "new_name"
},
"doc_as_upsert": true
}'
Is the pseudo code I wrote in the "script" field possible in Elasticsearch ? If so, I would love some help with the syntax (groovy, python or javascript).
Any alternative approach suggestions would be greatly appreciated too.
Elasticsearch has built-in optimistic concurrency control (+ here and here).
The way it works is that the Update API allows you two use the version parameter in order to control whether the update should proceed or not.
So taking your above example, the first index/update operation would create a document with version: 1. Then take the case where you have two concurrent requests. Both components A and B will send an updated document, they initially have both retrieved the document with version: 1 and will specify that version in their request (see version=1 in the query string below). Elasticsearch will update the document if and only if the provided version is the same as the current one
Component A and B both send this, but A's request is the first to make it:
curl -XPOST 'localhost:9200/test/type1/1/_update?version=1' -d '{
"doc": {
"name": "new_name"
},
"doc_as_upsert": true
}'
At this point the version of the document will be 2 and B's request will end up with HTTP 409 Conflict, because B assumed the document was still at version 1, even though the version increased in the meantime due to A's request.
B can definitely retrieve the document with the new version (i.e. 2) and try its update again, but this time with ?version=2in the URL. If it's the first one to reach ES, the update will succeed.
I think the script should be like this:
"script": "if(ctx._source.user_update_time > my_new_time) ctx._source.user_update_time=my_new_time;"
or
"script": "ctx._source.user_update_time > my_new_time ? ctx.op=\"none\" : ctx._source.user_update_time=my_new_time"

Is a partial representation of document a valid "set of changes" as per HTTP PATCH RFC?

Here is what RFC 5789 says:
The PATCH method requests that a set of changes described in the request entity be applied to the resource identified by the Request-URI. The set of changes is represented in a format called a "patch document" identified by a media type. If the Request-URI does not point to an existing resource, the server MAY create a new resource, depending on the patch document type (whether it can logically modify a null resource) and permissions, etc.
The difference between the PUT and PATCH requests is reflected in the way the server processes the enclosed entity to modify the resource identified by the Request-URI. In a PUT request, the enclosed entity is considered to be a modified version of the resource stored on the origin server, and the client is requesting that the stored version be replaced. With PATCH, however, the enclosed entity contains a set of instructions describing how a resource currently residing on the origin server should be modified to produce a new version.
Let's say I have { "login": "x", "enabled": true }, and I want to disable it.
According to post "Please. Don't Patch Like An Idiot.", the proper PATCH request would be
[{ "op": "replace", "path": "/enabled", "value": false }]
However, let's take this request:
{ "enabled": false }
It also 'contains a set of instructions describing how a resource currently residing on the origin server should be modified', the only difference is that JSON property is used instead of JSON object.
It seems less powerful, but array changes can have some other special syntax if required (e.g. {"a":{"add":[], "remove":[]}}), and server logic might not be able to handle anything more powerful anyway.
Is it an improper PATCH request as per RFC? And if so, why?
And, on the other hand, would a { "op": "disable" } be a correct PATCH request?
the only difference is that JSON property is used instead of JSON object.
It's actually a bit deeper than that. The reference to RFC 6902 is important. The first request has a Content-Type of application/json-patch+json, but the second is application/json
The important thing is that you use a 'diff media type,' one that's useful for this purpose. You don't have to use JSON-PATCH, (I'm a big fan of json-merge-patch), but you can't just use anything you want. What you're asking about in the second part is basically 'can I make my own media type' and the answer is 'yes,' just please document it and register it with the IANA.

Can I retrieve all revisions of a deleted document?

I know I can retrieve all revisions of an "available" document, but can I retrieve the last "available" version of a deleted document? I do not know the revision id prior to the delete. This is the command I am currently running...it returns {"error":"not_found","reason":"deleted"}.
curl -X GET http://localhost:5984/test_database/a213ccad?revs_info=true
I've got this problem, trying to recover deleted document, here is my solution:
0) until you run a compaction, get deleted history, e.g.:
curl http://example.iriscouch.com/test/_changes
1) you'll see deleted documents with $id and $rev, put empty document as new version, e.g.:
curl -X PUT http://example.iriscouch.com/test/$id?rev=$rev -H "Content-Type: application/json" -d {}
2) now you can get all revisions info, e.g:
curl http://example.iriscouch.com/test/$id?revs_info=true
See also Retrieve just deleted document
Besides _changes, another good way to do this is to use keys with _all_docs:
GET $MYDB/_all_docs?keys=["foo"] ->
{
"offset": 0,
"rows": [
{
"id": "foo",
"key": "foo",
"value": {
"deleted": true,
"rev": "2-eec205a9d413992850a6e32678485900"
}
}
],
"total_rows": 0
}
Note that it has to be keys; key will not work, because only keys returns info for deleted docs.
You can get the last revision of a deleted document, however first you must first determine its revision id. To do that, you can query the _changes feed and scan for the document's deletion record — this will contain the last revision and you can then fetch it using docid?rev=N-XXXXX.
I remember some mailinglist discussion of making this easier (as doing a full scan of the changes feed is obviously not ideal for routine usage), but I'm not sure anything came of it.
I've hit this several times recently, so for anyone else wandering by ...
This question typically results from a programming model that needs to know which document was deleted. Since user keys such as 'type' don't survive deletion and _id is best assigned by couch, it would often be nice to peak under the covers and see something about the doc that was deleted. An alternative is to have a process that sets deleted:True (no underscore) for documents, and to adjust any listener filters, etc., to look for deleted:True. One of the processes can then actually delete the document. This means that any process triggering on the document doesn't need to track an _id for eventual deletion.

Change notification in CouchDB when a field is set

I'm trying to get notifications in a CouchDB change poll as soon as pre-defined field is set or changed. I've already had a look at filters that can be used for filtering change events(db/_changes?filter=myfilter). However, I've not yet found a way to include this temporal information, because you can only get the current version of the document in this filter functions.
Is there any possibility to create such a filter?
If it does not work, I could export my field to a separate database and the only poll for changes in that db, but I'd prefer to keep together my data for obvious reasons.
Thanks in advance!
You are correct: filters and _changes feeds can only see snapshots of a document. What you need is a function which can see the old document and the new document and act correctly. But that is unavailable in _filters and _changes.
Obviously your client code knows if it updates that field. You might update your client code however there is a better solution.
Update functions can access both documents. I suggest you make an _update
function which notices the field change and flags that in the document. Next you
have a simple filter checking for that flag. The best part is, you can use a
rewrite function to make the HTTP API exactly the same as before.
1. Create an update function to flag interesting updates
Your _design/myapp would be {"updates", "smart_updater": "(see below)"}.
Update functions are very flexible (see my recent update handlers
walkthrough). However we only want to mimic the normal HTTP/JSON API.
Your updates.smart_updater field would look like this:
function (doc, req) {
var INTERESTING = 'dollars'; // Set me to the interesting field.
var newDoc = JSON.parse(req.body);
if(newDoc.hasOwnProperty(INTERESTING)) {
// dollars was set (which includes 0, false, null, undefined
// values. You might test for newDoc[INTERESTING] if those
// values should not trigger this code.
if((doc === null) || (doc[INTERESTING] !== newDoc[INTERESTING])) {
// The field changed or created!
newDoc.i_was_changed = true;
}
}
if(!newDoc._id) {
// A UUID generator would be better here.
newDoc._id = req.id || Math.random().toString();
}
// Return the same JSON the vanilla Couch API does.
return [newDoc, {json: {'id': newDoc._id}}];
}
Now you can PUT or POST to /db/_design/myapp/_update/[doc_id] and it will feel
just like the normal API except if you update the dollars field, it will add
an additional flag, i_was_changed. That is how you will find this change
later.
2. Filter for documents with the changed field
This is very straightforward:
function(doc, req) {
return doc.i_was_changed;
}
Now you can query the _changes feed with a ?filter= parameter. (Replication
also supports this filter, so you could pull to your local system all documents
which most recently changed/created the field.
That is the basic idea. The remaining steps will make your life easier if you
already have lots of client code and do not want to change the URLs.
3. Use rewriting to keep the HTTP API the same
This is available in CouchDB 0.11, and the best resource is Jan's blog post,
nice URLs in CouchDB.
Briefly, you want a vhost which sends all traffic to your rewriter (which itself
is a flexible "bouncer" to all design doc functionality based on the URL).
curl -X PUT http://example.com:5984/_config/vhosts/example.com \
-d '"/db/_design/myapp/_rewrite"'
Then you want a rewrites field in your design doc, something like (not
tested)
[
{
"comment": "Updates should go through the update function",
"method": "PUT",
"from": "db/*",
"to" : "db/_design/myapp/_update/*"
},
{
"comment": "Creates should go through the update function",
"method": "POST",
"from": "db/*",
"to" : "db/_design/myapp/_update/*"
},
{
"comment": "Everything else is just like normal",
"from": "*",
"to" : "../../../*"
}
]
(Once again, I got this code from examples and existing code I have laying
around but it's not 100% debugged. However I think it makes the idea very clear.
Also remember this step is optional however the advantage is, you never have to
change your client code.)

Resources