Change notification in CouchDB when a field is set - couchdb

I'm trying to get notifications in a CouchDB change poll as soon as pre-defined field is set or changed. I've already had a look at filters that can be used for filtering change events(db/_changes?filter=myfilter). However, I've not yet found a way to include this temporal information, because you can only get the current version of the document in this filter functions.
Is there any possibility to create such a filter?
If it does not work, I could export my field to a separate database and the only poll for changes in that db, but I'd prefer to keep together my data for obvious reasons.
Thanks in advance!

You are correct: filters and _changes feeds can only see snapshots of a document. What you need is a function which can see the old document and the new document and act correctly. But that is unavailable in _filters and _changes.
Obviously your client code knows if it updates that field. You might update your client code however there is a better solution.
Update functions can access both documents. I suggest you make an _update
function which notices the field change and flags that in the document. Next you
have a simple filter checking for that flag. The best part is, you can use a
rewrite function to make the HTTP API exactly the same as before.
1. Create an update function to flag interesting updates
Your _design/myapp would be {"updates", "smart_updater": "(see below)"}.
Update functions are very flexible (see my recent update handlers
walkthrough). However we only want to mimic the normal HTTP/JSON API.
Your updates.smart_updater field would look like this:
function (doc, req) {
var INTERESTING = 'dollars'; // Set me to the interesting field.
var newDoc = JSON.parse(req.body);
if(newDoc.hasOwnProperty(INTERESTING)) {
// dollars was set (which includes 0, false, null, undefined
// values. You might test for newDoc[INTERESTING] if those
// values should not trigger this code.
if((doc === null) || (doc[INTERESTING] !== newDoc[INTERESTING])) {
// The field changed or created!
newDoc.i_was_changed = true;
}
}
if(!newDoc._id) {
// A UUID generator would be better here.
newDoc._id = req.id || Math.random().toString();
}
// Return the same JSON the vanilla Couch API does.
return [newDoc, {json: {'id': newDoc._id}}];
}
Now you can PUT or POST to /db/_design/myapp/_update/[doc_id] and it will feel
just like the normal API except if you update the dollars field, it will add
an additional flag, i_was_changed. That is how you will find this change
later.
2. Filter for documents with the changed field
This is very straightforward:
function(doc, req) {
return doc.i_was_changed;
}
Now you can query the _changes feed with a ?filter= parameter. (Replication
also supports this filter, so you could pull to your local system all documents
which most recently changed/created the field.
That is the basic idea. The remaining steps will make your life easier if you
already have lots of client code and do not want to change the URLs.
3. Use rewriting to keep the HTTP API the same
This is available in CouchDB 0.11, and the best resource is Jan's blog post,
nice URLs in CouchDB.
Briefly, you want a vhost which sends all traffic to your rewriter (which itself
is a flexible "bouncer" to all design doc functionality based on the URL).
curl -X PUT http://example.com:5984/_config/vhosts/example.com \
-d '"/db/_design/myapp/_rewrite"'
Then you want a rewrites field in your design doc, something like (not
tested)
[
{
"comment": "Updates should go through the update function",
"method": "PUT",
"from": "db/*",
"to" : "db/_design/myapp/_update/*"
},
{
"comment": "Creates should go through the update function",
"method": "POST",
"from": "db/*",
"to" : "db/_design/myapp/_update/*"
},
{
"comment": "Everything else is just like normal",
"from": "*",
"to" : "../../../*"
}
]
(Once again, I got this code from examples and existing code I have laying
around but it's not 100% debugged. However I think it makes the idea very clear.
Also remember this step is optional however the advantage is, you never have to
change your client code.)

Related

NodeJS - Simplify/Resolve GraphQL query

I am currently writing a Lambda authorizer for an AWS AppSync API, however the authorization depends on the target resource being accessed.
Every resource has their own ACL listing the users and conditions for allowing access to it.
Currently the best I could find would be to get the identity of the caller, look at all the ACLs, and authorize the call while denying access to all the other resources, what's not only highly inefficient, but also extremely impractical, if not impossible.
The solution I had originally came up with was to get the target resource, retrieve the ACL and check if the user fits the specified criteria. The problem is that I am unable to reliably define what's the target resource. What I get from AWS is a request like this:
{
"authorizationToken": "ExampleAUTHtoken123123123",
"requestContext": {
"apiId": "aaaaaa123123123example123",
"accountId": "111122223333",
"requestId": "f4081827-1111-4444-5555-5cf4695f339f",
"queryString": "mutation CreateEvent {...}\n\nquery MyQuery {...}\n",
"operationName": "MyQuery",
"variables": {}
}
}
So, I only have the query string and variables, leaving the actual parsing of this to me. I got to convert it to an AST using graphql-js, but it's still extremely verbose and most importantly, it's structure varies greatly.
My first code to retrieve the target worked for the AppSync console queries, but not the Amplify Front-End, for example. I also can't rely on something as simple as the variable name, as an attacker could quite easily craft a query with an arbitrary name, or even not use variables at all.
I thought about implementing this authorization logic within Lambda Resolvers, what should be simpler in a way, but would require me to use resolvers as authorizers, what doesn't seem ideal, and implement the entire resolver logic when I just want the most trivial possible resolvers.
Ideally I'd like something like this:
/* Schema:
type Query {
operationName(key: KEY!): responseType
}*/
/* Query:
query abitraryQueryName($var1: KEY!) {
operationName(key: $var1) {
field1
field2
}
}*/
/* Variables:
{ "var1": "value1" } */
parsedQuery = {
operation: "operationName",
params: { "key": "value1" },
fields: [ "field1", "field2" ]
};
Is there any way to resolve/simplify the queries from GraphQL to JSON/similar in a way that this information can be easily extracted?
Well, couldn't find anything on it, so I made something myself.
On the off chance someone needs something similar, here's the gist with the code I used: https://gist.github.com/Iorpim/6544dad46060522dd0b17477871bc434
I didn't make it a proper full lib, as it's a very specific use case and it's likely a one-off, and I am also not sure how reliable it is, but it solves my problem!

ExpressJS: How to cache on demand

I'm trying to build a REST API with express, sequelize (PostgreSQL dialect) and node.
Essentially I have two endpoints:
Method
Endpoint
Desc.
GET
/api/players
To get players info, including assets
POST
/api/assets
To create an asset
And there is a mechanism which updates a property (say price) of assets, over a cycle of 30 seconds.
Goal
I want to cache the results of GET /api/players, but I want some control over it, so that whenever a user creates an asset (using POST /api/assets) and right after that a request to GET /api/players should give the updated data (i.e. including the property which updates for every 30 seconds) and cache it until it gets updated in the next cycle.
Expected
The following should demonstrate it:
GET /api/players
JSON Response:
[
{
"name": "John Doe"
"assets": [
{
"id":1
"price": 10
}
]
}
]
POST /api/assets
JSON Request:
{
"id":2
}
GET /api/players
JSON Response:
{
"name": "John Doe"
"assets": [
{
"id":1
"price": 10
},
{
"id":2
"price": 7.99
}
]
}
What I have managed to do so far
I have made the routes, but GET /api/players has no cache mechanism and basically queries the database every time it is requested.
Some solutions I have found, but none seem to meet my scenario
apicache (https://www.youtube.com/watch?v=ZGymN8aFsv4&t=1360s): But I don't have a specific duration, because a user can create an asset anytime.
Example implementation
I have seen (kind off) similar implementation (that I desire) in Github actions workflow for implementing cache, where you define a key and unless the key has changed it uses the same packages and doesn't install packages everytime, (example: https://github.com/python-discord/quackstack/blob/6792fd5868f28573bb8f9565977df84e7ba50f42/.github/workflows/quackstack.yml#L39-L52)
Is there any package, to do that? So that while processing POST /api/assets I can change the key in its handler, and thus GET /api/players gives me the updated result (also I can change the key in that 30 seconds cycle too), and after that it gives me the cached result (until it is updated in the next cycle).
Note: If you have a solution please try to stick with some npm packages, rather than something like redis, unless its the only/best solution.
Thanks in advance!
(P.S. I'm a beginner and this is my first question in SO)
Typically caching is done with help of Redis. Redis is in-memory key-value store. You could handle the cache in the following manner.
In your handler for POST operation update/reset cached entry for players.
In your handler for GET operation if the Redis has the entry in cache return it, otherwise do the logic query the data, add the entry to the cache and return the data.
Alternatively, you could use Memcached.
A bit late to this answer but I was looking for a similar solution. I found that the apicache library not only allows for caching for specified durations, but the cache can also be manually cleared.
apicache.clear([target]) - clears cache target (key or group), or entire cache if no value passed, returns new index.
Here is an example for your implementation:
// POST /api/assets
app.post('/api/assets', function(req, res, next) {
// update assets then clear cache
apicache.clear()
// or only clear the specific players cache by using a parameter
// apicache.clear('players')
res.send(response)
})

Which HTTP Method to Choose When Building Restful API

I am new to node.js and have my first node.js Restful API built in hapi.js framework. All the services do is basically doing database query. An example of the services is like this:
let myservice = {
method: "POST",
path: "/updateRule",
config: {
handler: (request, reply) => {
updateRule(request.payload)
.then((result) => {
reply(successResponse(request, result));
})
.catch((err) => reply(failResponse(request, err)).code(500));
},
validate: {
payload: {
ruleId: joi.number().required(),
ruleName: joi.string().required(),
ruleDesc: joi.string().required()
}
},
auth: "jwt",
tags: ["api", "a3i"]
},
}
updateRule(input): Promise<any> {
return new Promise((resolve, reject) => {
let query = `select a3i.update_rule(p_rul_id := ${input.ruleId}, p_rul_name := '${input.ruleName}', p_rul_desc := '${input.ruleDesc}')`;
postgresQuery(lbPostgres, query, (data, commit, rollback) => {
try {
let count = data.rows[0].update_rule.count;
if (count === 1) {
let ruleId = data.rows[0].update_rule.result[0];
let payload: SuccessPayload = {
type: "string",
content: `Rule ${ruleId} has been updated`
};
commit();
resolve(payload);
} else {
let thisErr = new Error("No rule can be found.");
thisErr.name = "4003";
throw thisErr;
}
}
catch (err) {
rollback();
if (err.name === "4003") {
reject(detailError(4003, err.message));
} else {
reject(detailError(4001, err.message));
}
}
}, reject);
});
}
As you can see, when the service is called, it evokes a database call (query) and updates specified row in database table. Similarly, I have other services named createRule/deleteRule creating/deleting records in database table. In my opinion, the difference between the services is doing different database query. I read this post PUT vs. POST in REST but couldn't see any difference of POST and PUT in my case.
Here are my questions:
What HTTP method should I used in this case?
Most of Restful API examples (for example https://www.codementor.io/olatundegaruba/nodejs-restful-apis-in-10-minutes-q0sgsfhbd) use the same URL with different HTTP methods to do different operations on same "resource", which in my opinion is usually a database table. What's the benefit of this architecture compared with my practice in which one URL only has one HTTP method and only do one type of operation?
I know this question does not refer to a problem and is not specific. Some people may give it a down-vote. But as a beginner I really want to know what's a typical Restful API and make sure my API is "best practice". Please help!
If the resource already exists and thus you have a specific URI to that exact resource and you want to update it, then use PUT.
If the resource does not exist yet and you want to create it and you will let the server pick the URI that represents that new resource, then use POST and the POST URI will be a generic "create new resource" URI, not a URI to a specific resource and it will create the URI that represents that resource.
You can also use PUT to create a new resource if the caller is going to create the resource URI that represents the new resource. In that case, you would just PUT to that new resource and, if a resource with that URI already exists, it would be updated, if not, it would be created.
You do not have to support both. You can decide to make your api work in a way that you just use one or the other.
In your specific case, an update of a specific row in your database that already exists would pretty much always be a PUT because it already exists so you're doing a PUT to a specific URI that represents that row.
What's the benefit of this architecture compared with my practice in which one URL only has one HTTP method and only do one type of operation?
It's really up to you how you want to present your API. The general concept behind REST is that you have several components:
resource identifier
data
method
In some cases, the method can be subsumed by GET, PUT, POST or DELETE so you just need the resource identifier, data and GET, PUT, POST or DELETE.
In other cases or other designs, the method is more detailed than can be expressed in just a PUT or POST, so you have a method actually in the URL in which case, you may not need the distinction between PUT and POST as much.
For example, an action might be "buy". While you could capture that in a POST where the method is implied by the rest of the URL, you may want to actually POST to a URL that has a method in it: /buy for clarity and then you may use that same endpoint prefix with other methods such as /addToCart, etc... It really depends upon what the objects are in your REST design and what operations you want to surface on them. Sometimes, the objects lends themselves to just GET, PUT, POST and DELETE and sometimes, you want more info in the URL as to the specific operation to be carried out on that resource.
If you want to be Rest compliant, you can just use Post and Get.
If you want to be Restfull, you need to base your method on the CRUD
Create -> Post
Read -> Get
Update -> Put or Patch
Delete -> Delete
About building a full API, using method on the same URL could be easier to build / understand. All queries about your user will be on the user url and not user/get, user/add, user/update ... It let you have the same functionality, without too much different URL.
When you build an API, you will want to have some logs, for stats analysis and other stuff. This way, if you split with your method, you can just have a filter to logs how many Post requests, or Get requests.
In fact, you could build an API only with Get requests too. But spliting with methods and URL is the best way to avoid complexes URL (or URL with too much action name) and to have an easiest way to log every requests going through your API
- List item
Level 1 is Rest
Level 2 is Restfull
Level 3 is Hateoas
You should find more informations inside some books or articles written by Martin Fowler
What I usually do is use "POST" for creating a new resource, and use "PUT" for updating an already existing resource.
For your second question, yes most API's use the same URL to do different things on the same resource. That could be because of security where you don't want to expose what you are doing in your URL's (/delete for example). Also, many frameworks generate an auto URL for a resource (Object Class), that is then differentiated on the request method. People just don't tend to use custom URL's for those.

CouchDB Read Configuration from design document

I would like to store a value in the config file and look it up in the design document for comparing against update values. I'm sure I have seen this but, for the life of me, I can't seem to remember how to do this.
UPDATE
I realize (after the first answer) that there was more than one way to interpret my question. Hopefully this example clears it up a little. Given a configuration:
curl -X PUT http://localhost:5984/_config/shared/token -d '"0123456789"'
I then want to be able to look it up in my design document
{
"_id": "_design/loadsecrets",
"validate_doc_update": {
"test": function (newDoc,oldDoc) {
if (newDoc.supersecret != magicobject.config.shared.token){
throw({unauthorized:"You don't know the super secret"});
}
}
}
}
It's the abilitly to do something like the magicobject.config.shared.token that I am looking for.
UPDATE 2
Another potentially useful (contrived) scenario
curl -X PUT http://trustedemployee:5984/_config/eventlogger/detaillevel -d '"0"'
curl -X PUT http://employee:5984/_config/eventlogger/detaillevel -d '"2"'
curl -X PUT http://vicepresident:5984/_config/eventlogger/detaillevel -d '"10"'
Then on devices tracking employee behaviour:
{
"_id": "_design/logger",
"updates": {
"logger": function (doc,req) {
if (!doc) {
doc = {_id:req.id};
}
if(req.level < magicobject.config.eventlogger.detaillevel ){
doc.details = req.details;
}
return [doc, req.details];
}
}
}
Here's a follow-up to my last answer with more general info:
There is no general way to use configuration, because CouchDB is designed with scalability, stability and predictability in mind. It has been designed using many principles of functional programming and pure functions, avoiding side effects as much as possible. This is a Good Thing™.
However, each type of function has additional parameters that you can use, depending on the context the function is called with:
show, list, update and filter functions are executed for each request, so they get the request object. Here you have the req.secObj and req.userCtx to (ab)use for common configuration. Also, AFAIK the this keyword is set to the current design document, so you can use the design doc to get common configuration (at least up to CouchDB 1.6 it worked).
view functions (map, reduce) don't have additional parameters, because the results of a view are written to disk and reused in subsequent calls. map functions must be pure (so don't use e.g. Math.random()). For shared configuration across view functions within a single design doc you can use CommonJS require(), but only within the views.lib key.
validate doc update functions are not necessarily executed within a user-triggered http request (they are called before each write, which might not be triggered only via http). So they have the userCtx and secObj added as separate parameters in their function signature.
So to sum up, you can use the following places for configuration:
userCtx for user-specific config. Use a special role (e.g. with a prefix) for storing small config bits. For example superLogin does this.
secObj for database-wide config. Use a special member name for small bits (as you should normally use roles instead of explicit user names, secObj.members.names or secObj.admins.names is a good place).
the design doc itself for design-doc-wide config. Best use the this.views.lib.config for this, as you can also read this key from within views. But keep in mind that all views are invalidated as soon as you change this key. So if the view results will stay the same no matter what the config values are, it might be better to use a this.config key.
Hope this helps! I can also add examples if you wish.
I think I know what you're talking about, and if I'm right then what you are asking for is no longer possible. (at least in v1.6 and v2.0, I'm not sure when this feature was removed)
There was a lesser-known trick that allowed a view/show/list/validation/etc function to access the parent design document as this in your function. For example:
{
"_id": "_design/hello-world",
"config": {
"PI": 3.14
},
"views": {
"test": {
"map": "function (doc) { emit(this.config.PI); })"
}
}
}
This was a really crazy idea, and I imagine it was removed because it created a circular dependency between the design document and the code of the view that made the process of invalidating/rebuilding a view index a very tricky affair.
I remember using this trick at some point in the distant past, but the feature is definitely gone now. (and likely to never return)
For your special use-case (validating a document with a secret token), there might be a workaround, but I'm not sure if the token might leak in some place. It all depends what your security requirements are.
You could abuse the 4th parameter to validate_doc_update, the securityObject (see the CouchDB docs) to store the secret token as the first admin name:
{
"test": "function (newDoc, oldDoc, userCtx, secObj) {
var token = secObj.admins.names[0];
if (newDoc.supersecret != token) {
throw({unauthorized:"You don't know the super secret"});
}
}"
}
So if you set the db's security object to {admins: {names: ["s3cr3t-t0k3n"], roles: ["_admin"]}}, you have to pass 's3cr3t-t0k3n' as the doc's supersecret property.
This is obviously a dirty hack, but as far as I remember, the security object may only be read or modified by admins, you wouldn't immediately leak your token to the public. But consider adding a separate layer between the CouchDB and your caller if you need "real" security.

CouchDB: Single document vs "joining" documents together

I'm tryting to decide the best approach for a CouchApp (no middleware). Since there are similarities to my idea, lets assume we have a stackoverflow page stored in a CouchDB. In essence it consists of the actual question on top, answers and commets. Those are basically three layers.
There are two ways of storing it. Either within a single document containing a suitable JSON representation of the data, or store each part of the entry within a separate document combining them later through a view (similar to this: http://www.cmlenz.net/archives/2007/10/couchdb-joins)
Now, both approaches may be fine, yet both have massive downsides from my current point of view. Storing a busy document (many changes through multiple users are expected) as a signle entity would cause conflicts to happen. If user A stores his/her changes to the document, user B would receive a conflict error once he/she is finished typing his/her update. I can imagine its possible to fix this without the users knowledge through re-downloading the document before retrying.
But what if the document is rather big? I'll except them to become rather blown up over time which would put quite some noticeable delay on a save process, especially if the retry process has to happen multiple times due to many users updating a document at the same time.
Another problem I'd see is editing. Every user should be allowed to edit his/her contributions. Now, if they're stored within one document it might be hard to write a solid auth handler.
Ok, now lets look at the multiple documents approach. Question, Answers and Comments would be stored within their own documents. Advantage: only the actual owner of the document can cause conflicts, something that won't happen too often. Being rather small elements of the whole, redownloading wouldn't take much time. Furthermore the auth routine should be quite easy to realize.
Now here's the downside. The single document is real easy to query and display. Having a lot of unsorted snippets laying around seems like a messy thing since I didn't really get the actual view to present me with a 100% ready to use JSON object containing the entire item in an ordered and structured format.
I hope I've been able to communicate the actual problem. I try to decide which solution would be more suitable for me, which problems easier to overcome. I imagine the first solution to be the prettier one in terms of storage and querying, yet the second one the more practical one solvable through better key management within the view (I'm not entirely into the principle of keys yet).
Thank you very much for your help in advance :)
Go with your second option. It's much easier than having to deal with the conflicts. Here are some example docs how I might structure the data:
{
_id: 12345,
type: 'question',
slug: 'couchdb-single-document-vs-joining-documents-together',
markdown: 'Im tryting to decide the best approach for a CouchApp (no middleware). Since there are similarities to...' ,
user: 'roman-geber',
date: 1322150148041,
'jquery.couch.attachPrevRev' : true
}
{
_id: 23456,
type: 'answer'
question: 12345,
markdown: 'Go with your second option...',
user : 'ryan-ramage',
votes: 100,
date: 1322151148041,
'jquery.couch.attachPrevRev' : true
}
{
_id: 45678,
type: 'comment'
question: 12345,
answer: 23456,
markdown : 'I really like what you have said, but...' ,
user: 'somedude',
date: 1322151158041,
'jquery.couch.attachPrevRev' : true
}
To store revisions of each one, I would store the old versions as attachments on the doc being edited. If you use the jquery client for couchdb, you get it for free by adding the jquery.couch.attachPrevRev = true. See Versioning docs in CouchDB by jchris
Create a view like this
fullQuestion : {
map : function(doc) {
if (doc.type == 'question') emit([doc._id, null, null], null);
if (doc.type == 'answer') emit([doc.question, doc._id, null], null);
if (doc.type == 'comment') emit([doc.question, doc.answer, doc._id], null) ;
}
}
And query the view like this
http://localhost:5984/so/_design/app/_view/fullQuestion?startkey=['12345']&endkey=['12345',{},{}]&include_docs=true
(Note: I have not url encoded this query, but it is more readable)
This will get you all of the related documents for the question that you will need to build the page. The only thing is that they will not be sorted by date. You can sort them on the client side (in javascript).
EDIT: Here is an alternative option for the view and query
Based on your domain, you know some facts. You know an answer cant exist before a question existed, and a comment on an answer cant exist before an answer existed. So lets make a view that might make it faster to create the display page, respecting the order of things:
fullQuestion : {
map : function(doc) {
if (doc.type == 'question') emit([doc._id, doc.date], null);
if (doc.type == 'answer') emit([doc.question, doc.date], null);
if (doc.type == 'comment') emit([doc.question, doc.date], null);
}
}
This will keep all the related docs together, and keep them ordered by date. Here is a sample query
http://localhost:5984/so/_design/app/_view/fullQuestion?startkey=['12345']&endkey=['12345',{}]&include_docs=true
This will get back all the docs you will need, ordered from oldest to newest. You can now zip through the results, knowing that the parent objects will be before the child ones, like this:
function addAnswer(doc) {
$('.answers').append(answerTemplate(doc));
}
function addCommentToAnswer(doc) {
$('#' + doc.answer).append(commentTemplate(doc));
}
$.each(results.rows, function(i, row) {
if (row.doc.type == 'question') displyQuestionInfo(row.doc);
if (row.doc.type == 'answer') addAnswer(row.doc);
if (row.doc.type == 'comment') addCommentToAnswer(row.doc)
})
So then you dont have to perform any client side sorting.
Hope this helps.

Resources