per user db (pouchdb/couchdb) & shared data - doable? - couchdb

I have the following use case / application:
a TODO app where users can: CRUD their on TODOs (I am using pouchdb/couchdb syncing). Pretty much based on Josh Morony's tutorial
Now I want to add ability of users to "share" (post only, there is no "edit"/put) their TODO items with other users, who would be able to just view (read) those (no write access etc).
I was thinking about adding a separate DB (let's call it "shared TODOs DB") where my server can write and all users can only read.
So any user could potentially do .find() across that read only db, while posting in there will still be governed by a server upon requests to share their TODO coming from users.
Is there a known pattern (approach) for this? Are there any real apps / examples that already do that?

CouchDB does not offer a good way to do this, but if your backend has a facility for maintaining lots of shared databases (in addition to single-user ones) I think you could pull it off.
The initial impulse is to use continuous filtered replication to/from a central "master" hub, but this leaves you with two problems:
Replication can't delete documents unless they are deleted! I.e. filtered replication doesn't determine which documents exist in the target, but rather whether a document's changes will propagate to the target. The effect of this is that you can share, but you can't unshare.
This sort of replication doesn't scale. For N user databases you need to keep in sync, every change to the central database forces all N replications to process that change independently. Now consider that the amount of changes M that happen on the central database will be almost directly proportional to N (i.e. the more users you have the more frequently all those users will have to process changes to the central database!) You could somewhat mitigate this by adding more layers (fanning out from one central hub to semi-shared hub to individual databases), but that also adds practical complications.
How is your sharing organized? What you might be able to do is set up a shared database for each "share" combination.
When user A wants to share document D with user B, "move" the document into a new database AB. Practically this means: copy the contents of D to a new document D' in database AB, and then delete the original D from database A.
How does this sync?
Keep in mind PouchDB clients can replicate to/from more than one source database, so user A will replicate from A and AB, while user B replicates from B and AB. The trick is to use filtered replication in the opposite direction, back to the server databases:
The now-shared document D' should never go to database A or database B, and likewise an unshared document X should never go to database AB. Something like this:
┌───────────────────────┐
│ server:5984/user_a │
└───┬───────────────▲───┘
└─┐ ┌─┘ ┌──────────────────────────────┐
│ ●──────│ if (doc.shared_with == null) │
┌─▼───────────┴─┐ └──────────────────────────────┘
│ local │
└──▲──────────┬─┘
┌─┘ └─┐ ┌──────────────────────────────┐
│ ●──────│ if (doc.shared_with == 'ab') │
┌───┴──────────────▼────┐ └──────────────────────────────┘
│ server:5984/shares_ab │
└───────────────────────┘
So, assuming the share database(s) are already set up, when the local client wants to share some data it actually adds _deleted:true to the original (unshared) document and creates a new (shared) document. The deleted original propagates to the user_a database on the server and the created copy propagates to the shares_ab.
Unsharing then works pretty much the same: the client adds _deleted:true to the shared document and recreates the data again in a new unshared copy. From B's perspective, the document had showed up once it was in shares_ab and now disappears because it is deleted from shares_ab.
(Instead of user "pairs" you could extend this to ad-hoc sets of users, or simplify it to specific groups that users are already in, or whatnot. The key idea is to create an actually shared database for each unique sharing context needed.)

So for this kind of project, per-user database is the recommended pattern for your databases.
Idea
So basically, you will have :
One private database per user (with write/read permissions)
One central database read-only
As for the central database, you need it to be read-only and you also need to allow shared documents only. You need some kind of application proxy for this. You can basically build an API on top of the central database and allow access to shared documents only.
Then, you can setup replications from each user database to the central database and persist de replication in the _replicator database.
couchperuser
I'm not sure that per-user database plugin is working at the moment with the version 2.X.X but you can do it yourself with some sort of application process (Create the user, then create the database, then manage permissions of the new database)

Related

Couchdb add user database configuration

I want to use Couchdb to create a offline first app, where users can add documents.
Only the user who created a document should be able to change it, otherwise it should only be readable. For this i wanted to use the "peruser" mechanism of couchdb and replicate these documents into a main database where everyone can read.
Is it possible to automatically get the replication and other configurations (like design documents) configured when the database is created by the couch_peruser options?
I found a possible way myself:
add a validation function to the main database to deny writes (http://docs.couchdb.org/en/2.1.1/ddocs/ddocs.html#vdufun)
use _db_updates endpoint to monitor database creation (http://docs.couchdb.org/en/2.1.1/api/server/common.html#db-updates)
create a _replicator document to set up a continuous replication from userdb to main db (http://docs.couchdb.org/en/2.1.1/replication/replicator.html)
One thing to look about is that maintaining a lot of continuous replications requires a lot of system resources.
Another way is to create authorships with design documents. With this aproach we don't need to maintain replications to the main database, because every entry can be hold in one database (main database in my case).
http://guide.couchdb.org/draft/validation.html#authorship

Node Express APP 1 to N (with MongoDB)

we are developing a big node app with express and MongoDB. We are trying to get the best performance, because we will have multiple clients (maybe 100+) running on the same server.
We were thinking in a one-to-n APP, one instance, one database and multiple clients accessing their domains.
I want to know what is the best settings for this scenario (one server, multiple clients) to performance and development
One instance, one database (clients data would be identified by a company ObjectId on the entry and clients would access a domain or subroute)
One instance, multiple tables (or databases, what is the best?)
Multiple instances, multiple tables
Any other ideas?
On the first setting, the developers will always worry about the current company and this can bring limitations to the app
On the second setting, the concern will continue but the company will not interfere on the database entries (more clean model)
On the third setting (maybe the best for development) only one company will be treated and brings a lot of possibilities, but may bring performance issues (all instances will run on a single server)
Other settings I have not thought of can be better.
Notes:
We are using the mongoose library
I have some experience with WordPress and i like the way themes and plugins are created for it. We are trying to achieve a level of performance similar to Wordpress with PHP (several Wordpress running on a server efficiently)
sorry about bad english
You don't need to manage multiple instance as you can create a company collection and in that collection you can store every single company and then you just need to create a reference of all these values in users.Please make sure that you have made unique index on company collection.It is really easy handle such scenarios in RDBMS(mysql).
And one more thing you can also run multiple mongod client on same instance by just changing the port and if you are looking for that sort of solution then you can do that as well.
Please note following things before using mongo:-
Please use mongo only if you have over TB's of data because that doesn't make any sense to use mongodb for some mb's or gb's of data.
Use of indexes is must in mongo if you want maximum performance.
Mongo stores all the indexes in main memory and if the indexes size is more then memory that it start swapping of indexes which is really costly and hence please make sure that you have different servers for your application and your db.
I still says it would be better to use RDBMS if you don't have TB's of data to deal with.
Why this approach:-
Let me give you a scenario.
You have 100 companies and with in 100 companies you have 1000 users for each of the company. i.e. you have 1L records in your user collection.Now i want to delete a single user or i want to update a user or i want to fetch a user from a single company then i don't need to traverse my complete database as i can make a index on my user collection using user-id and company id(compound index) or even i can make a simple filter query on company id.
For index please read this
https://docs.mongodb.com/manual/core/index-compound/
And btw we are not saving company id as an object instead i am saving only the value of _id from company collection.

Multiple remote databases, single local database (fancy replication)

I have a PouchDB app that manages users.
Users have a local PouchDB instance that replicates with a single CouchDB database. Pretty simple.
This is where things get a bit complicated. I am introducing the concept of "groups" to my design. Groups will be different CouchDB databases but locally, they should be a part of the user database.
I was reading a bit about "fancy replication" in the pouchDB site and this seems to be the solution I am after.
Now, my question is, how do I do it? More specifically, How do I replicate from multiple remote databases into a single local one? Some code examples will be super.
From my diagram below, you will notice that I need to essentially add databases dynamically based on the groups the user is in. A critique of my design will also be appreciated.
Should the flow be something like this:
Retrieve all user docs from his/her DB into localUserDB
var groupDB = new PouchDB('remote-group-url');
groupDB.replicate.to(localUserDB);
(any performance issues with multiple pouchdb instances 0_0?)
Locally, when the user makes a change related to a specific group, we determine the corresponding database and replicate by doing something like:
localUserDB.replicate.to(groupDB) (Do I need filtered replication?)
Replicate from many remote databases to your local one:
remoteDB1.replicate.to(localDB);
remoteDB2.replicate.to(localDB);
remoteDB3.replicate.to(localDB);
// etc.
Then do a filtered replication from your local database to the remote database that is supposed to receive changes:
localDB.replicate.to(remoteDB1, {
filter: function (doc) {
return doc.shouldBeReplicated;
}
});
Why filtered replication? Because your local database contains documents from many sources, and you don't want to replicate everything back to the one remote database.
Why a filter function? Since you are replicating from the local database, there's no performance gain from using design docs, views, etc. Just pass in a filter function; it's simpler. :)
Hope that helps!
Edit: okay, it sounds like the names of the groups that the user belongs to are actually included in the first database, which is what you mean by "iterate over." No, you probably shouldn't do this. :) You are trying to circumvent CouchDB's built-in authentication/privilege system.
Instead you should use CouchDB's built-in roles, apply those roles to the user, and then use a "database per role" scheme to ensure users only have access to their proper group DBs. Users can always query the _users API to see what roles they belong to. Simple!
For more details, read the pouchdb-authentication README.

CouchDB simple document design: need feedback

I am in the process of designing document storage for CouchDB and would really appreciate some feedback. These documents are to represent "assets".
These databases will also be synced locally to the browser via pouchdb.
Requirements:
Each user can have many assets
Users can share assets with others by providing them with a URI such as (xyz.com/some_id). Once users click this URI, they are considered to have been "joined" and are now part of a group.
Group users can share assets of their own with other members of the group.
My design
Each user will have his/her own database to store assets - let's call it "user". Each user DB will be prefixed with the his/her unique ID.
Shared assets will be stored in a separate database - let's call it "group". shared assets are DUPLICATED here and have an additional field for userId (to indicate creator).
Group database is prefixed with a unique ID just like a user database is prefixed with one too.
The reason for storing group assets in a separate database is because when pouchdb runs locally, it only knows about the current user and his/her shared assets. It does not know about other users and will should not query these "other" users' databases.
Any input would be GREATLY appreciated.
Seems like a great design. Another alternative would be to just have one database per group ("role"), and then replicate from a user's group(s) into their local PouchDB.
That might get hairy, though, when it comes time to replicate back to the server, because you're going to have to filter the documents as they leave the user's local database, depending on which group-database they belong to. Still, you're going to have to do that on the server side anyway with your current design.
Either way is fine, honestly. The only downside of your current approach is that documents are duplicated on the server side (once per user-db and once per group-db). On the other hand, your client code becomes dead-simple, because you don't have to do any filtered replication. If you have enough space on your server not to worry about it, then I would definitely go with your approach. :)

CQRS when reading with permissions for large data set

I am trying to understand how the read side of CQRS can work with a large document management application (videos/pdf files/ etc) that we are writing.
We want to show a list of all documents which the user has edit permission on (i.e. show all the documents the user can edit).There could be 10,000s of documents that a particular user could edit.
In general I have read that the a single "table" (flat structure) should suffice for most screens and with permissions you could have a table per role.
How would I design my read model to allow me to quickly get the documents that I can edit for a specific user?
Currently I can see a table holding holding my documents, another holding the users and another table that links the "editing" role between the user and the documents. So I am doing joins to get the data for this screen.
Also, there could be roles for deleting, viewing etc.
Is this the correct way in this case?
JD
You can provide a flat table that has a user id along with the respective denormalized document information.
SELECT * FROM documents_editable_by_user WHERE UserId = #UserId
SELECT * FROM documents_deletable_by_user WHERE UserId = #UserId
SELECT * FROM documents_visible_for_user WHERE UserId = #UserId
But you could even dynamically create a table/list per user in your read model store. This becomes quite easy once you switch from a SQL-based read store to NoSQL (if you haven't already.)
Especially when there are tens of thousands of documents visible for or editable by a user, flattened tables can give a real performance boost compared to joins.
When I had a read model that took the form of a filtering-search-form (pun not intended), I used rhino-security as the foundation of an authorization service.
I configured the system so that the authorization service's tables got pushed through SQL Server's pub-sub system and SQL Server Agent, to the clients that were partially displaying the denormalized data - I then let Rhino.Security join the authorization model together into the read model, on a per-user basis.
Because I essentially never wrote to the read model's authorization tables from the read model, we got a nice encapsulation on the authorization service's database and logic, because authorization was only changed through that service, and it was globally unique and specific (consistent) to that service. This meant that our custom GUIs for handling advanced (hierarchial entities, user groups, users, permissions, per-entity-permissions) authorization requirements could still do CRUD against this authorization model and that would be pushed in soft real time to any read model.

Resources