couchdb design views, updating fields on doc creation - couchdb

Is it possible to have couch update or change fields on the fly when you create/update a doc? For example in the design view.... validate_doc_update:
function(newDoc, oldDoc, userCtx) {
}
Within that function I can throw errors like:
if(!newDoc.user_email && !newDoc.user_name && !newDoc.user_password){
throw({forbidden : 'all fields required'});
}
My Question is how would I reassign a field? I tried this:
newDoc.user_password ="changed";
with changed being some new value or hashed value. My overall goal is to build a user registration/login system with node and couchdb and have not found very good examples.

The validate_doc_update function cannot have any side effects and cannot change the document before storage. It only has the power to block an update or to let it through. This is important, because the function is not only called when a user requests an update, but also when changes are replicated from one CouchDB instance to another. So the function can be called multiple times for one document.
However, CouchDB now supports Document Update Handlers that can modify a document or even build it from scratch. These can be used to convert non-JSON input data into usable documents. You can find some documentation in the CouchDB Wiki.
Before you build your own user registration/login system, I'd suggest you look into the built-in CouchDB security features (if you haven't - some information here). They might not be enough for you (e.g. if you need email validation or something similar), but maybe you can build on them.

Related

CouchDB - Get custom fields within _users for replication filtering

I am developing a simple client for Android which fetches data from a CouchDB database. There will be only one database for all users. The data pull-replicated is filtered by a JS function. Such function (simplified) would be like this:
function(doc,req) {
if (!doc.type || doc.type !='item') { return false; }
if (doc.foo && ... && req.userCtx.bar.indexOf(doc.foo) != -1) { return true; }
...
}
As I have read in the official documentation, _users is a perfect place to set custom fields related to the user. So did I as you can see in the above code (see req.userCtx.bar array).
The problem I am facing is that the object/JSON req.userCtx only contains these fields: db, name and roles.
1. What would be a good alternative to my idea? I am a little bit stuck right now at this point. 2. How can I retrieve the user's data (all fields official and custom)?. 3. Is it correct to add as filter parameter a large array?
NOTE
I am thinking of a messy alternative of adding an array-field in every item which will contain the list with all users allowed to pull such item although I have the feeling that there must be another way.
Saving user data in _users is interesting because only the user or an admin can read a user's document.
However, as you've found out, that doesn't mean that all user data is available to the userCtx object. All you get is the user's name and roles array. Can you make do with roles?
To retrieve all of the user's data, you should fetch the user's document from the _users database. You can do that with a GET request on http://localhost:5984/_users/org.couchdb.user:[USER].
To know what would be an appropriate solution to your problem, we'd need quite a bit more info. For instance, looking at your code, it seems you designed that filter with the intention of restricting replication to documents listed as being visible to the user. However, you can't really lock down CouchDB in a way that replication works, and the user doesn't have read access to the entire database. You really need one db per user for this to work.

CouchDB replication strategy with dynamic groups of users

This is the situation:
We have a series of users who share some documents. The documents they can share might change throughout the day, so can the documents themselves (changes and deletions). The users can change some information on the documents.
E.g.
Users | Documents
A | X
A | Y
A | Z
B | X
B | Z
C | Y
Possible groups: A+C, A+B
The server on CouchDB is a replica of a SQL Server DB with this data, an ETL takes care of managing changes on CouchDB. However, the CouchDB database is replicated on each user phone via PouchDB.
The goal:
To replicate changes and deletions accordingly.
What we've tried:
1) we figured we'd structure our documents with a list of users that can access to it. Each document would have a "Users" array and then a filter in the design document would take care of the replication to the clients. Unfortunately document deletions and document changes that won't pass the filter (e.g. a user is removed from the array) are not present in the _changes feed so cannot be replicated accordingly on the clients
2) database per user. This is not possible, because users need to see each others work on the documents (they share them)
3) database per group of users. Pretty much the same problem as the first solution, but worse. In fact:
- groups of user can change and no longer be present: how do reflect that client-side?
- a document can shift to a new group: it will have to be redownloaded from scratch. This greatly increases the download size
- the same document can be in more than one group! (see example above)
- each client would have to know in which group she is everytime she logs in and replicate multiple databases. Then on the return trip you'd have to know on which databases the document was present
Is there a recipe for this situation? Am I missing an obvious solution?
EDIT
Partial solution for case 1:
localDB.sync(remoteDB, {
live: true,
retry: true,
filter: 'app/by_user',
query_params: { "agente": agent }
})
.on('paused', function(info){
console.log("paused");
localDB.allDocs().then(function(docs){
console.log("allDocs");
docs.rows.forEach(function(row){
console.log(row);
remoteDB.get(row.id)
.then(function(doc){
if(doc.Agents.indexOf(agent) < 0){
localDB.remove(doc);
}
});
});
});
})
.on('change', function(result){
console.log("change!");
result.change.docs.forEach(function(change) {
if(!change.deleted){
$rootScope.$apply(function(){
$rootScope.$broadcast('upsert', change);
});
}
});
});
Each remove() is giving me a 409 (conflict), and rightfully so. Is there a way to tell Pouch "no longer consider this as replicable and just remove it from my DB?"
(3) Seems like the simplest solution to me, i.e. the "database per role" solution.
I think your difficulty stems from trying to manage permissions inside the documents themselves (and then using filtering replication). When you do that, you are basically trying to mirror CouchDB's permission system inside your documents, which is going to cause headaches.
Why not create a database per role, and assign roles to users using the normal _users database? If roles change, then users will lose or gain access to a set of documents. You would need to have server endpoints to handle the role-shuffling, or you would need to set up separate "admin" databases with special privileges, where users can change the roles.
Then on the client side, you can either replicate from multiple CouchDB databases into a single PouchDB (and then collate the results together yourself), or into a single PouchDB (probably a bad idea if you need to sync bidirectionally). Obviously you would need an initial step where you determine which databases the user has access to, but that's a small downside in my opinion.
Then if the user loses access to a document, they will simply get normal 401 errors during replication (which will show up in the 'denied' event during live replication). No need for ddocs or filtered replication - much simpler!
We arrived at the conclusion that:
1) our use-case might not be what CouchDB is good for
2) we value our mental health. After almost a month struggling with this problem we'd rather try and fail
3) documents are relatively inexpensive, so even if they stay on the user's phone that won't cause any major distress. If the data builds up too much they can simply clear the data and start fresh
Solution:
1) Keep the architecture as to point 1
2) After each 'pause' event triggers compare local docs with remote docs, if the remote doc doesn't pass the filter remove it from the UI. Should there be a way to remove the local document only we'll be very interested in upgrading to that logic.
1) still sounds as the simplest approach to me..
I don't know PouchDB very well, but in plain CouchDB, changes on deleted document can be workaround by extending attributes on deleted document, using your own custom DELETE function.
I mean.. a delete is like an update which sets the _deleted attribute to true.
So, instead of directly deleting documents, using the normal CouchDB crud DELETE on document, you can create an update function like this:
function(doc,req){
// optional acls for deleting doc.. doc is owned by req.userCtx.name
// doc.users are users already granted to work with this doc
return [{
"_id" : doc._id,
"_rev": doc._rev,
"_deleted":true,
"users": doc.users
},"Ok doc deleted"];
}
Furthermore, using document rewriting rules, this update function can eventually be called even when submitting an HTTP DELETE request(not only on PUT or POST).. In this way your delete behaviour becomes totally transparent to the client... and you delete in a way which can be more useful for your use case.
The Smileupps Chatty couchapp tutorial app uses this approach: extended deletes for different document types are performed within user/drop.js, profile/drop.js, chat/drop.js files

Use linux timestamp in CouchDB map function

Trying to update an existing CouchDB map function so that it only returns docs created in the past 24 hours.
The current map is very simple
function(doc) {
if(doc.email && doc.type == 'user')
emit(doc.email, doc);
}
I'd like to get the current linux timestamp value and compare that to the creationTime.unix value stored in the doc.
Is that possible?
N.B I'm building the view in futon
I do not know if you can do that, but it if you can that would be very bad for CouchDB database sanity.
Map functions for same document should always emit same values, each time you invoke it (provided that document has not changed in the mean time). This is important since CouchDB stores this emited data in the index, and does not recalculate it again until it is necessary. If map functions could emit different values for the same doc, that would render index unusable.
So, no, do not try that.
Good news is that you can easily achieve what you need without that. If you emit creation time, than you can query your view just for docs with creation time in certain interval like in:
/blog/_design/docs/_view/by_date?startkey="2010/01/01 00:00:00"&endkey="2010/02/00 00:00:00"
Read more how you can query your views in CouchDB The Definitive Guide

How and where do you define your database structure in Meteor?

I am looking at the documentation for Meteor and it gives a few examples. I'm a bit confused about two things: First, where do you build the db (keeping security in mind)? Do I keep it all in the server/private folder to restrict client-side access? And second, how do I define the structure? For example, the code they show:
Rooms = new Meteor.Collection("rooms");
Messages = new Meteor.Collection("messages");
Parties = new Meteor.Collection("parties");
Rooms.insert({name: "Conference Room A"});
var myRooms = Rooms.find({}).fetch();
Messages.insert({text: "Hello world", room: myRooms[0]._id});
Parties.insert({name: "Super Bowl Party"});
I don't understand how a collection's structure is defined. Are they just able to define a collection and throw arbitrary data into it?
To answer your first question about where to put the new Meteor.Collection statements, they should go in a .js file in a folder accessible by both client and server, such as /collections. (With some exceptions: any collections that are never synced to the client, like server logs, should be defined inside /server somewhere; and any local collections should be defined in client code.)
As for your second question about structure: MongoDB is a document database, which by definition has no structure. Per the docs:
A database holds a set of collections. A collection holds a set of
documents. A document is a set of key-value pairs. Documents have
dynamic schema. Dynamic schema means that documents in the same
collection do not need to have the same set of fields or structure,
and common fields in a collection’s documents may hold different types
of data.
You may also have heard this called NoSQL. Each document (record in SQL parlance) can have different fields. Hence, there's no place where you define initial structure for a collection; each document gets its "structure" defined when it's inserted or updated.
In practice, I like to create a block comment above each new Meteor.Collection statement explaining what I intend the structure to be for most or all documents in that collection, so I have something to refer to later on when I insert or update the collection's documents. But it's up to me in those insert or update functions to follow whatever structure I define for myself.
A good practice would probably be defining your collection on both client and server with a single bit of javascript code. In other words, put the following
MyCollection = new Meteor.Collection("rooms");
// ...
anywhere but neither in the client nor in the server directory. Note that this directive alone does not expose any sensitive data to nobody.
A brand new meteor project would contain by default the insecure and autopublish packages. The former will basically allow any client to alter your database in every possible way, i.e. insert, update and remove documents. The latter will make sure that all database content is published to everyone, no matter how ridiculously this may sound. But fear not! Their only goal is to simplify the development process at the very early stage. You should get rid of these to guys from your project as soon as you start considering security issues of any kind.
As soon as the insecure package is removed from your project you can control the database privileges by defining MyCollection.allow and MyCollection.deny rules. Please check the documentation for more details. The only thing I would like to mention here is that this code should probably be considered as a sensitive one, so I guess you should put it into your server directory.
Removing the autopublish package has effect on the set of data that will be sent to your clients. Again you can control it and define privilages of your choice by implementing a custom Meteor.publish routine. This is all documented here. Here, you have no option. The code can only run in the server environment, so the best choice would be to put it in the server directory.
About your second question. The whole buzz about NoSQL databases (like mongodb) is to put as few restrictions on the structure of your database as possible. In other words, how the collections are structured is only up to you. You don't have to define no models and you can change the structure of your documents (and or remove fields) any time you want. Doesn't it sound great? :)

I want absolute atomicity on a single couchdb instance (insert, fail if already existing)

I've come to really love the couchdb style of organizing and updating data, but there are a few situations where I really need to be able to create an entry and determine if an equivalent entry is already in existence before returning to the user. The only situation that this is absolutely necessary for my application is user registration. I'm fine with having all user registration writes go to a particular, designated couchdb instance known as the "registration-instance".
I want to hash the user_id into some _id to use. Then execute a put with this _id, but fail if the _id is already inserted. I need to return to the user that the user name is already reserved, and I cannot detect the conflict later and resolve it at that point, because the user would be under the impression that they had reserved the user name.
I don't see why couchdb couldn't provide some way to do this, under the assumption that you designate that inserts for a particular "type" of document always are routed to a particular instance.
If you send a single CouchDB server a PUT request for a new user document you should get the behavior you want already.
If the document does not exist then it will create the new document.
If the document does exist then it is guaranteed to return a 409 conflict error. This is due to the fact that you did not supply a _rev property because you aren't trying to update the pre-existing document.
Only when the _id and _rev properties match will CouchDB update the existing document.
You might also want to read up on document update handlers:
http://wiki.apache.org/couchdb/Document_Update_Handlers
You might use an update handler to hash the user_id and dynamically assign the appropriate _id. You can also customize what kind of error response couch sends with an update handler.
Good luck!

Resources