Multiple remote databases, single local database (fancy replication) - couchdb

I have a PouchDB app that manages users.
Users have a local PouchDB instance that replicates with a single CouchDB database. Pretty simple.
This is where things get a bit complicated. I am introducing the concept of "groups" to my design. Groups will be different CouchDB databases but locally, they should be a part of the user database.
I was reading a bit about "fancy replication" in the pouchDB site and this seems to be the solution I am after.
Now, my question is, how do I do it? More specifically, How do I replicate from multiple remote databases into a single local one? Some code examples will be super.
From my diagram below, you will notice that I need to essentially add databases dynamically based on the groups the user is in. A critique of my design will also be appreciated.
Should the flow be something like this:
Retrieve all user docs from his/her DB into localUserDB
var groupDB = new PouchDB('remote-group-url');
groupDB.replicate.to(localUserDB);
(any performance issues with multiple pouchdb instances 0_0?)
Locally, when the user makes a change related to a specific group, we determine the corresponding database and replicate by doing something like:
localUserDB.replicate.to(groupDB) (Do I need filtered replication?)

Replicate from many remote databases to your local one:
remoteDB1.replicate.to(localDB);
remoteDB2.replicate.to(localDB);
remoteDB3.replicate.to(localDB);
// etc.
Then do a filtered replication from your local database to the remote database that is supposed to receive changes:
localDB.replicate.to(remoteDB1, {
filter: function (doc) {
return doc.shouldBeReplicated;
}
});
Why filtered replication? Because your local database contains documents from many sources, and you don't want to replicate everything back to the one remote database.
Why a filter function? Since you are replicating from the local database, there's no performance gain from using design docs, views, etc. Just pass in a filter function; it's simpler. :)
Hope that helps!
Edit: okay, it sounds like the names of the groups that the user belongs to are actually included in the first database, which is what you mean by "iterate over." No, you probably shouldn't do this. :) You are trying to circumvent CouchDB's built-in authentication/privilege system.
Instead you should use CouchDB's built-in roles, apply those roles to the user, and then use a "database per role" scheme to ensure users only have access to their proper group DBs. Users can always query the _users API to see what roles they belong to. Simple!
For more details, read the pouchdb-authentication README.

Related

Multi-tenancy Architecture in a graph DB

I would like to share my thoughts with you and try to get some advice. I would like to define my application with the best architecture as possible. Any comment would be highly appreciated. Here we go...
My technologies: NestJs(Node), neo4j/arangodb(graph DB), Nginx for proxy(Micro-services Approach).
My business case: SaaS application. Many customers with many users, one database per customer and the same code (just one instance) of our codebase.
we have a set of data models which will be same for all customer but a relation between them will differ. As per my research GraphDB is the best match for such operations. so I'm planning to create separate Instance/Database for each customer otherwise too many relations will make harder to scale.
Problem: From my point of view the problem can be seen with two different approach.
I need to allow multiple users to connect to different databases at the same time with the same code (just one installation). In Nestjs App how can I change the database configuration on each API request. Shall I save DB URI in a table, based on user/customer type it will fetch DB URI? then other concerns like does it affect on latency time, if any request failed then is there any possibility that request can fetch data from wrong DB?
How can we create sub-graphs in neo4j/arangodb so we can fetch sub-graph based on the customer.
On the other hand, I found a couple of interesting links:
https://neo4j.com/developer/multi-tenancy-worked-example/
https://www.arangodb.com/enterprise-server/oneshard/
https://dzone.com/articles/multitenant-graph-applications
Someone could provide me aditional info?
Thanks for your time
Best regards
With ArangoDB, a solution that works is:
Use a single database for all customers
Use Foxx microservices in that database to provide access to all data
Enforce a tenantId value on every call to Foxx
Use dedicated collections for each tenant in that database
Set up a web server (e.g. Node.js) in front of ArangoDB that serves data to all tenants
Only allow connections to Foxx from that front end web server
Each tenant will need a few collections, depending on your solution, try to keep that number as low as possible.
This model works pretty well, and you're able to migrate customers between instances / regions as their data is portable, because it's in collections.

Couchdb add user database configuration

I want to use Couchdb to create a offline first app, where users can add documents.
Only the user who created a document should be able to change it, otherwise it should only be readable. For this i wanted to use the "peruser" mechanism of couchdb and replicate these documents into a main database where everyone can read.
Is it possible to automatically get the replication and other configurations (like design documents) configured when the database is created by the couch_peruser options?
I found a possible way myself:
add a validation function to the main database to deny writes (http://docs.couchdb.org/en/2.1.1/ddocs/ddocs.html#vdufun)
use _db_updates endpoint to monitor database creation (http://docs.couchdb.org/en/2.1.1/api/server/common.html#db-updates)
create a _replicator document to set up a continuous replication from userdb to main db (http://docs.couchdb.org/en/2.1.1/replication/replicator.html)
One thing to look about is that maintaining a lot of continuous replications requires a lot of system resources.
Another way is to create authorships with design documents. With this aproach we don't need to maintain replications to the main database, because every entry can be hold in one database (main database in my case).
http://guide.couchdb.org/draft/validation.html#authorship

CouchDB simple document design: need feedback

I am in the process of designing document storage for CouchDB and would really appreciate some feedback. These documents are to represent "assets".
These databases will also be synced locally to the browser via pouchdb.
Requirements:
Each user can have many assets
Users can share assets with others by providing them with a URI such as (xyz.com/some_id). Once users click this URI, they are considered to have been "joined" and are now part of a group.
Group users can share assets of their own with other members of the group.
My design
Each user will have his/her own database to store assets - let's call it "user". Each user DB will be prefixed with the his/her unique ID.
Shared assets will be stored in a separate database - let's call it "group". shared assets are DUPLICATED here and have an additional field for userId (to indicate creator).
Group database is prefixed with a unique ID just like a user database is prefixed with one too.
The reason for storing group assets in a separate database is because when pouchdb runs locally, it only knows about the current user and his/her shared assets. It does not know about other users and will should not query these "other" users' databases.
Any input would be GREATLY appreciated.
Seems like a great design. Another alternative would be to just have one database per group ("role"), and then replicate from a user's group(s) into their local PouchDB.
That might get hairy, though, when it comes time to replicate back to the server, because you're going to have to filter the documents as they leave the user's local database, depending on which group-database they belong to. Still, you're going to have to do that on the server side anyway with your current design.
Either way is fine, honestly. The only downside of your current approach is that documents are duplicated on the server side (once per user-db and once per group-db). On the other hand, your client code becomes dead-simple, because you don't have to do any filtered replication. If you have enough space on your server not to worry about it, then I would definitely go with your approach. :)

PouchDB as a real live data tool for different collections

I'm thinking of using PouchDB as a solution to automatically update comments that are submitted by users on papers.
It should mimic the behavior of a subscribe/publish service. Whenever someone submits a comment in his client, the list of comments on an other client should automatically update.
This is possible using PouchDB as described in the getting started guide:
var db = new PouchDB('paper');
var remoteCouch = 'http://user:pass#mname.iriscouch.com/paper';
function sync() {
var opts = {live: true};
db.replicate.to(remoteCouch, opts, syncError);
db.replicate.from(remoteCouch, opts, syncError);
}
The app holds different papers, each with their own comments. When using PouchDB as my publish/subscribe service, I have these questions:
Is it a good idea to use PouchDB this way?
If I only want to sync the comments of the current paper a user is working on, should I create a new database for each paper? (This would also mean I would lose the possibility to query for example all the users comments in all the papers from a single database)
Is there a way to only sync a part of the database? This way I could still use the database to hold all the comments even for different papers.
Yep, PouchDB works fine for real-time stuff. It doesn't use web sockets, but it uses long-polling, which is fast enough for most use cases.
It sounds like you probably should create a separate database for each paper, assuming you want to restrict access on a per-paper basis. CouchDB authentication is kinda tricky, but basically if you want to control read access, you can either give users full read access or zero read access to an entire database. There's a writeup here.
Also don't worry about creating thousands of databases; a "database" is cheap in CouchDB.
The only other thing I would advise is that maybe you would like the relational-pouch plugin, because then you could easily set up a relational-style database with a "paper" type and a "comment" type.

ACL best practices, store roles in user object, or separate table/collection?

I am using nodejs, and have been researching acl/authorization for the past week. I have found only a couple, but none seem to have all the features I require. The closest has been https://github.com/OptimalBits/node_acl, but I don't think it supports protecting resources by id (for example, if I wanted to allow user 12345 and only user 12345 to access user/12345/edit). Hence, I think I will have to make a custom acl solution for myself.
My question regarding this is, what are some pros and cons to storing roles (user, admin, moderator, etc.) under each user object, as opposed to creating another collection/table that maps each user with their authorization rules? node_acl uses a separate collection, whereas most of the other ones depend on the roles array in user objects.
By the way, I am using Mongodb at the moment. However I have not researched the pros and cons yet of using relational vs. nonrelational databases for authentication yet, so if let me know if your answer depends on that.
As I was typing this up, I thought of one thing. If I store roles in a separate collection, it is more portable. I would be able to swap out the acl system much more easily. (I think?)
The question here seems like it could be abstracted from "where should I store my roles" to "how should I store related information in Mongo (or NoSQL in general)". It's a relation vs non-relational modeling issue.
Non-Relational
Using Node + Mongo, storing the roles on the user will make it really easy to determine if a user has access to the feature, given that you can just look in the 'roles' property. The trade off is that you have lots of duplicate information ('user_read' could be a role on every user account) and if you end up changing that property, you'll need to update it inside every user object.
You could store the roles in their own collection and then store the id for that entry in the Roles collection on your User model, but then you'll still need to fetch the actual record from the collection to display any of it's information (though arguably this could be a rare occurrence)
Relational
Storing these in a relational DB would be a more "traditional" approach in that you can establish the relationships between the tables (via FKs / join tables or what not). This can be a good solution, but then you no longer have the benefits of using a NoSQL database.
Summary
If the rest of your app is stored in Mongo and has to stay there (for performance or whatever constraint) then you are probably better off doing it all in Mongo. Most of the advice I've come across says don't mix & match data stores, e.g. use one or the other, but not both. That being said, I've done projects with both and it can get messy but sometimes the pros outweigh the cons.
I like #DavidWelch answer, but I'd like to tackle the question from another perspective because the library mentioned gives the option to use a different data store entirely.
Storing roles in a separate data store:
(Pro) Can make the system more performant if you are using a faster data store. (More advantageous in distributed environments?)
(Con) You will have to ensure consistency between the two data stores.
General notes:
You can add roles/permissions such as 'blog\123' in acl. You can also give a user permissions based on verbs such as put, delete, get, etc..
I think it is easier to create a pluggable solution that does not depend on your storage implementation. Perhaps that is why acl does not store roles in the same collections you have.
If you choose to keep the roles in your own collection, consider adding them to a token (JWT). That way, you will not have to check your collection for every request that needs authorization.
I hope that helped.

Resources