CouchDB replication strategy with dynamic groups of users - couchdb

This is the situation:
We have a series of users who share some documents. The documents they can share might change throughout the day, so can the documents themselves (changes and deletions). The users can change some information on the documents.
E.g.
Users | Documents
A | X
A | Y
A | Z
B | X
B | Z
C | Y
Possible groups: A+C, A+B
The server on CouchDB is a replica of a SQL Server DB with this data, an ETL takes care of managing changes on CouchDB. However, the CouchDB database is replicated on each user phone via PouchDB.
The goal:
To replicate changes and deletions accordingly.
What we've tried:
1) we figured we'd structure our documents with a list of users that can access to it. Each document would have a "Users" array and then a filter in the design document would take care of the replication to the clients. Unfortunately document deletions and document changes that won't pass the filter (e.g. a user is removed from the array) are not present in the _changes feed so cannot be replicated accordingly on the clients
2) database per user. This is not possible, because users need to see each others work on the documents (they share them)
3) database per group of users. Pretty much the same problem as the first solution, but worse. In fact:
- groups of user can change and no longer be present: how do reflect that client-side?
- a document can shift to a new group: it will have to be redownloaded from scratch. This greatly increases the download size
- the same document can be in more than one group! (see example above)
- each client would have to know in which group she is everytime she logs in and replicate multiple databases. Then on the return trip you'd have to know on which databases the document was present
Is there a recipe for this situation? Am I missing an obvious solution?
EDIT
Partial solution for case 1:
localDB.sync(remoteDB, {
live: true,
retry: true,
filter: 'app/by_user',
query_params: { "agente": agent }
})
.on('paused', function(info){
console.log("paused");
localDB.allDocs().then(function(docs){
console.log("allDocs");
docs.rows.forEach(function(row){
console.log(row);
remoteDB.get(row.id)
.then(function(doc){
if(doc.Agents.indexOf(agent) < 0){
localDB.remove(doc);
}
});
});
});
})
.on('change', function(result){
console.log("change!");
result.change.docs.forEach(function(change) {
if(!change.deleted){
$rootScope.$apply(function(){
$rootScope.$broadcast('upsert', change);
});
}
});
});
Each remove() is giving me a 409 (conflict), and rightfully so. Is there a way to tell Pouch "no longer consider this as replicable and just remove it from my DB?"

(3) Seems like the simplest solution to me, i.e. the "database per role" solution.
I think your difficulty stems from trying to manage permissions inside the documents themselves (and then using filtering replication). When you do that, you are basically trying to mirror CouchDB's permission system inside your documents, which is going to cause headaches.
Why not create a database per role, and assign roles to users using the normal _users database? If roles change, then users will lose or gain access to a set of documents. You would need to have server endpoints to handle the role-shuffling, or you would need to set up separate "admin" databases with special privileges, where users can change the roles.
Then on the client side, you can either replicate from multiple CouchDB databases into a single PouchDB (and then collate the results together yourself), or into a single PouchDB (probably a bad idea if you need to sync bidirectionally). Obviously you would need an initial step where you determine which databases the user has access to, but that's a small downside in my opinion.
Then if the user loses access to a document, they will simply get normal 401 errors during replication (which will show up in the 'denied' event during live replication). No need for ddocs or filtered replication - much simpler!

We arrived at the conclusion that:
1) our use-case might not be what CouchDB is good for
2) we value our mental health. After almost a month struggling with this problem we'd rather try and fail
3) documents are relatively inexpensive, so even if they stay on the user's phone that won't cause any major distress. If the data builds up too much they can simply clear the data and start fresh
Solution:
1) Keep the architecture as to point 1
2) After each 'pause' event triggers compare local docs with remote docs, if the remote doc doesn't pass the filter remove it from the UI. Should there be a way to remove the local document only we'll be very interested in upgrading to that logic.

1) still sounds as the simplest approach to me..
I don't know PouchDB very well, but in plain CouchDB, changes on deleted document can be workaround by extending attributes on deleted document, using your own custom DELETE function.
I mean.. a delete is like an update which sets the _deleted attribute to true.
So, instead of directly deleting documents, using the normal CouchDB crud DELETE on document, you can create an update function like this:
function(doc,req){
// optional acls for deleting doc.. doc is owned by req.userCtx.name
// doc.users are users already granted to work with this doc
return [{
"_id" : doc._id,
"_rev": doc._rev,
"_deleted":true,
"users": doc.users
},"Ok doc deleted"];
}
Furthermore, using document rewriting rules, this update function can eventually be called even when submitting an HTTP DELETE request(not only on PUT or POST).. In this way your delete behaviour becomes totally transparent to the client... and you delete in a way which can be more useful for your use case.
The Smileupps Chatty couchapp tutorial app uses this approach: extended deletes for different document types are performed within user/drop.js, profile/drop.js, chat/drop.js files

Related

Using redis to send friend status

I was looking around the internet to find out how I can send user status such as offline and online, etc to only friends using socket io. Some people were saying to use Redis. so I had a look and played around with it. I am also using mongodb to store friends and users.
This is my setup right now:
//Status List:
// 0 - offline
// 1 - online
// 2 - away
// 3 busy
//Set the status
redisClient.hmset ("online_status:userID", "status", "1");
//Check if someone is online
redisClient.hgetall ("online_status:userID", (err, reply) => {
console.log(reply)
})
Is it fine if I use it like this to get user status? or is there a better way to do this?
Another question is that, is that is it fine to keep looping hgetall or is there a better way to get multiple statuses at once?
You are using a hash type for storing a single information and you are using hgetall to retrieve it, so I assume you are not that familiar with redis data types yet. So first let me explain in short the three data types I'll talk about (find all types in the docs here https://redis.io/topics/data-types-intro ):
String: Is a simple key/value type, access it with set(key, value) and get(key, value)
Hash: Is a bunch of key/values stored under one redis key. Useful for storing attributes of an entity, like you could have a "userdata:userID" key and store name, avatar, status... with it. Access it with hset(key, field, value), hget(key, field), hgetall(key)
Set: Is a collection of unique strings, access it with sadd(key, member), sismember(key, member), smembers(key)
If you are only going to save the online status it would be cleaner to use a string type with set, get and del (since usually most users are offline most of the time, delete them and save space). For this simple key/value usecase redis is actually not even better than good old memcache.
If you intend to store more user related attributes (mood, motto, avatar...) you should rename it to "userdata:userID" and check with hget("userdata:userID", "status") and use hgetall only to retrieve all attributes.
Another approach could be to store all users in a SET: sadd('users:online', userID) and check with sismember('users:online', userID) or get all online users with smembers('users:online'). Suppose you store all friends in another SET friends:userID, you could grab all online friends of a user with a single intersect command sinter('friends:userID', 'users:online') - pretty nice and elegant IMHO, but this get's complicated with more different states and doesn't work with redis-cluster.
I would prefer the SET approach. Multiple hgets should also be fine until you encounter issues due to the one guy (there is allways one) that has thousands of contacts and refreshes all the time. At that point you could still introduce some friendship limits or caching.

CouchDB - Get custom fields within _users for replication filtering

I am developing a simple client for Android which fetches data from a CouchDB database. There will be only one database for all users. The data pull-replicated is filtered by a JS function. Such function (simplified) would be like this:
function(doc,req) {
if (!doc.type || doc.type !='item') { return false; }
if (doc.foo && ... && req.userCtx.bar.indexOf(doc.foo) != -1) { return true; }
...
}
As I have read in the official documentation, _users is a perfect place to set custom fields related to the user. So did I as you can see in the above code (see req.userCtx.bar array).
The problem I am facing is that the object/JSON req.userCtx only contains these fields: db, name and roles.
1. What would be a good alternative to my idea? I am a little bit stuck right now at this point. 2. How can I retrieve the user's data (all fields official and custom)?. 3. Is it correct to add as filter parameter a large array?
NOTE
I am thinking of a messy alternative of adding an array-field in every item which will contain the list with all users allowed to pull such item although I have the feeling that there must be another way.
Saving user data in _users is interesting because only the user or an admin can read a user's document.
However, as you've found out, that doesn't mean that all user data is available to the userCtx object. All you get is the user's name and roles array. Can you make do with roles?
To retrieve all of the user's data, you should fetch the user's document from the _users database. You can do that with a GET request on http://localhost:5984/_users/org.couchdb.user:[USER].
To know what would be an appropriate solution to your problem, we'd need quite a bit more info. For instance, looking at your code, it seems you designed that filter with the intention of restricting replication to documents listed as being visible to the user. However, you can't really lock down CouchDB in a way that replication works, and the user doesn't have read access to the entire database. You really need one db per user for this to work.

How and where do you define your database structure in Meteor?

I am looking at the documentation for Meteor and it gives a few examples. I'm a bit confused about two things: First, where do you build the db (keeping security in mind)? Do I keep it all in the server/private folder to restrict client-side access? And second, how do I define the structure? For example, the code they show:
Rooms = new Meteor.Collection("rooms");
Messages = new Meteor.Collection("messages");
Parties = new Meteor.Collection("parties");
Rooms.insert({name: "Conference Room A"});
var myRooms = Rooms.find({}).fetch();
Messages.insert({text: "Hello world", room: myRooms[0]._id});
Parties.insert({name: "Super Bowl Party"});
I don't understand how a collection's structure is defined. Are they just able to define a collection and throw arbitrary data into it?
To answer your first question about where to put the new Meteor.Collection statements, they should go in a .js file in a folder accessible by both client and server, such as /collections. (With some exceptions: any collections that are never synced to the client, like server logs, should be defined inside /server somewhere; and any local collections should be defined in client code.)
As for your second question about structure: MongoDB is a document database, which by definition has no structure. Per the docs:
A database holds a set of collections. A collection holds a set of
documents. A document is a set of key-value pairs. Documents have
dynamic schema. Dynamic schema means that documents in the same
collection do not need to have the same set of fields or structure,
and common fields in a collection’s documents may hold different types
of data.
You may also have heard this called NoSQL. Each document (record in SQL parlance) can have different fields. Hence, there's no place where you define initial structure for a collection; each document gets its "structure" defined when it's inserted or updated.
In practice, I like to create a block comment above each new Meteor.Collection statement explaining what I intend the structure to be for most or all documents in that collection, so I have something to refer to later on when I insert or update the collection's documents. But it's up to me in those insert or update functions to follow whatever structure I define for myself.
A good practice would probably be defining your collection on both client and server with a single bit of javascript code. In other words, put the following
MyCollection = new Meteor.Collection("rooms");
// ...
anywhere but neither in the client nor in the server directory. Note that this directive alone does not expose any sensitive data to nobody.
A brand new meteor project would contain by default the insecure and autopublish packages. The former will basically allow any client to alter your database in every possible way, i.e. insert, update and remove documents. The latter will make sure that all database content is published to everyone, no matter how ridiculously this may sound. But fear not! Their only goal is to simplify the development process at the very early stage. You should get rid of these to guys from your project as soon as you start considering security issues of any kind.
As soon as the insecure package is removed from your project you can control the database privileges by defining MyCollection.allow and MyCollection.deny rules. Please check the documentation for more details. The only thing I would like to mention here is that this code should probably be considered as a sensitive one, so I guess you should put it into your server directory.
Removing the autopublish package has effect on the set of data that will be sent to your clients. Again you can control it and define privilages of your choice by implementing a custom Meteor.publish routine. This is all documented here. Here, you have no option. The code can only run in the server environment, so the best choice would be to put it in the server directory.
About your second question. The whole buzz about NoSQL databases (like mongodb) is to put as few restrictions on the structure of your database as possible. In other words, how the collections are structured is only up to you. You don't have to define no models and you can change the structure of your documents (and or remove fields) any time you want. Doesn't it sound great? :)

CouchDB - Views not being updated after delete

I have a view similar to this (a contrived example):
function(doc) {
if (doc.attrib) {
emit([doc.attrib],doc._id)
}
}
Everything works as expected until the data is deleted. I get this crazy scenario where there is no data in the actual database (confirmed via _all_docs and _changes run on curl as well as all_documents on Futon). However the view still yields data (again on both curl and Futon).
The delete comprises Bulk delete and purge operations via ektorp. Running _changes after each confirms these work as expected. re-creating the view makes it reflect the true state of the documents in the DB.
Have I missed something obvious here or are views in CouchDB only incremental?
Did you really _purge the data? That should invalidate the view and cause a full rebuild. I'll note that _purge is not recommended for normal use. It exists only for accidents like putting your plaintext password in a document.
You may have exposed a bug in _purge, though, so if you can reliably induce this with _purge but not if you just delete, I encourage you to file a ticket on our JIRA (https://issues.apache.org/jira/browse/COUCHDB).
I'll note also that the fix will be to blow away the index if you purge, there is no incremental approach possible (you are literally removing the information that an incremental approach requires).

couchdb design views, updating fields on doc creation

Is it possible to have couch update or change fields on the fly when you create/update a doc? For example in the design view.... validate_doc_update:
function(newDoc, oldDoc, userCtx) {
}
Within that function I can throw errors like:
if(!newDoc.user_email && !newDoc.user_name && !newDoc.user_password){
throw({forbidden : 'all fields required'});
}
My Question is how would I reassign a field? I tried this:
newDoc.user_password ="changed";
with changed being some new value or hashed value. My overall goal is to build a user registration/login system with node and couchdb and have not found very good examples.
The validate_doc_update function cannot have any side effects and cannot change the document before storage. It only has the power to block an update or to let it through. This is important, because the function is not only called when a user requests an update, but also when changes are replicated from one CouchDB instance to another. So the function can be called multiple times for one document.
However, CouchDB now supports Document Update Handlers that can modify a document or even build it from scratch. These can be used to convert non-JSON input data into usable documents. You can find some documentation in the CouchDB Wiki.
Before you build your own user registration/login system, I'd suggest you look into the built-in CouchDB security features (if you haven't - some information here). They might not be enough for you (e.g. if you need email validation or something similar), but maybe you can build on them.

Resources