couchdb _find security, filtering _find returned results - couchdb

I know that we can define a validation function for updates in replication. But what about find/select queries?
I am wondering if is it possible to put mandatory filter to all _find queries eg:
filter password field from person documents? or put automatic filter to show only that user's documents.I don't want a user be able to see other user's documents in the same collection.
Is it possible to put a java api in this regard?

I work at Couchbase, so I'll answer the question from the Couchbase angle.
Couchbase 7.0 (currently available in Beta at time of writing) introduces scopes and collections. One of the use cases for scopes is to support multi-tenancy.
You create multiple "scopes" within a bucket, then give each database user access to only certain scopes.

Related

Possible to add data into commands on certain MongoDB collections?

Is it possible to add data into commands on certain MongoDB collections?
The use case here is for simple management of multitenancy. We have data that doesn't contain the tenant's id and we then want to insert the tenant's id in on every command (find, update, updateOne, insert, insertMany, etc.) to particular collections (some collections are generic tenant wide collections). We are using the MongoDB driver (rather not use Mongoose).
Currently, we have to remember to add the tenant id whenever we use a command, but this is a bit dangerous as it is possible to miss adding the tenant's id...
Thanks!
Out of the box Mongo does not come with this feature set.
Mongoose actually shines for these kind of things with its middleware infrastructure.
If mongoose is not an option you can look into these smaller hooks implementations:
https://www.npmjs.com/package/mongohooks
https://www.npmjs.com/package/mongodb-hooks
They give you something to work with I believe.
In the end we created a helper function which we passed the data to. The helper then adds the tenant's id to any commands nessasay and send the command to MongoDB.
This allows us to control when to restrict by tenant per collection per command type.

Multiple remote databases, single local database (fancy replication)

I have a PouchDB app that manages users.
Users have a local PouchDB instance that replicates with a single CouchDB database. Pretty simple.
This is where things get a bit complicated. I am introducing the concept of "groups" to my design. Groups will be different CouchDB databases but locally, they should be a part of the user database.
I was reading a bit about "fancy replication" in the pouchDB site and this seems to be the solution I am after.
Now, my question is, how do I do it? More specifically, How do I replicate from multiple remote databases into a single local one? Some code examples will be super.
From my diagram below, you will notice that I need to essentially add databases dynamically based on the groups the user is in. A critique of my design will also be appreciated.
Should the flow be something like this:
Retrieve all user docs from his/her DB into localUserDB
var groupDB = new PouchDB('remote-group-url');
groupDB.replicate.to(localUserDB);
(any performance issues with multiple pouchdb instances 0_0?)
Locally, when the user makes a change related to a specific group, we determine the corresponding database and replicate by doing something like:
localUserDB.replicate.to(groupDB) (Do I need filtered replication?)
Replicate from many remote databases to your local one:
remoteDB1.replicate.to(localDB);
remoteDB2.replicate.to(localDB);
remoteDB3.replicate.to(localDB);
// etc.
Then do a filtered replication from your local database to the remote database that is supposed to receive changes:
localDB.replicate.to(remoteDB1, {
filter: function (doc) {
return doc.shouldBeReplicated;
}
});
Why filtered replication? Because your local database contains documents from many sources, and you don't want to replicate everything back to the one remote database.
Why a filter function? Since you are replicating from the local database, there's no performance gain from using design docs, views, etc. Just pass in a filter function; it's simpler. :)
Hope that helps!
Edit: okay, it sounds like the names of the groups that the user belongs to are actually included in the first database, which is what you mean by "iterate over." No, you probably shouldn't do this. :) You are trying to circumvent CouchDB's built-in authentication/privilege system.
Instead you should use CouchDB's built-in roles, apply those roles to the user, and then use a "database per role" scheme to ensure users only have access to their proper group DBs. Users can always query the _users API to see what roles they belong to. Simple!
For more details, read the pouchdb-authentication README.

Securing the data accessed by Neo4j queries

I wish to implement security on the data contained in a Neo4j database down to the level of individual nodes and/or relationships.
Most data will be available to all users but some data will be restricted by user role. I can add either properties or labels to the data that I wish to restrict.
I want to allow users to run custom cypher queries against the data but hide any data that the user isn't authorised to see.
If I have to do something from the outside then not only do I have to filter the results returned but I also have to parse and either restrict or modify all queries that are run against the data to prevent a user from writing a query which acted on data that they aren't allowed to view.
The ideal solution would be if there is a low-level hook that allows intercepting the reads of nodes and relationships BEFORE a cypher query acts on those records. The interceptor would perform the security checks and if they fail then it would behave as though the node or relationship didn't exist at all. i.e. the same cypher query would have different results depending on who ran it. And this would apply to all possible queries e.g. count(n) not just those that returned the nodes/relationships.
Can something like this be done? If it's not supported already, is there a suitable place in the code that I could add such a security filter or would it require many code changes?
Thanks, Damon
As Chris stated, it's certainly not trivial on database level, but if you're looking for a solution on application level, you might have a look at Structr, a framework on top of and tightly integrated with Neo4j.
It provides node-level security based on ACLs, with users, groups, and different access levels. The security in Structr is implemented on the lowest level possible, f.e. we only instantiate objects if the querying user has the approriate access rights.
All higher access levels like REST API and UI see only the records available in the user's context.
[1] http://structr.org, https://github.com/structr/structr

ACL best practices, store roles in user object, or separate table/collection?

I am using nodejs, and have been researching acl/authorization for the past week. I have found only a couple, but none seem to have all the features I require. The closest has been https://github.com/OptimalBits/node_acl, but I don't think it supports protecting resources by id (for example, if I wanted to allow user 12345 and only user 12345 to access user/12345/edit). Hence, I think I will have to make a custom acl solution for myself.
My question regarding this is, what are some pros and cons to storing roles (user, admin, moderator, etc.) under each user object, as opposed to creating another collection/table that maps each user with their authorization rules? node_acl uses a separate collection, whereas most of the other ones depend on the roles array in user objects.
By the way, I am using Mongodb at the moment. However I have not researched the pros and cons yet of using relational vs. nonrelational databases for authentication yet, so if let me know if your answer depends on that.
As I was typing this up, I thought of one thing. If I store roles in a separate collection, it is more portable. I would be able to swap out the acl system much more easily. (I think?)
The question here seems like it could be abstracted from "where should I store my roles" to "how should I store related information in Mongo (or NoSQL in general)". It's a relation vs non-relational modeling issue.
Non-Relational
Using Node + Mongo, storing the roles on the user will make it really easy to determine if a user has access to the feature, given that you can just look in the 'roles' property. The trade off is that you have lots of duplicate information ('user_read' could be a role on every user account) and if you end up changing that property, you'll need to update it inside every user object.
You could store the roles in their own collection and then store the id for that entry in the Roles collection on your User model, but then you'll still need to fetch the actual record from the collection to display any of it's information (though arguably this could be a rare occurrence)
Relational
Storing these in a relational DB would be a more "traditional" approach in that you can establish the relationships between the tables (via FKs / join tables or what not). This can be a good solution, but then you no longer have the benefits of using a NoSQL database.
Summary
If the rest of your app is stored in Mongo and has to stay there (for performance or whatever constraint) then you are probably better off doing it all in Mongo. Most of the advice I've come across says don't mix & match data stores, e.g. use one or the other, but not both. That being said, I've done projects with both and it can get messy but sometimes the pros outweigh the cons.
I like #DavidWelch answer, but I'd like to tackle the question from another perspective because the library mentioned gives the option to use a different data store entirely.
Storing roles in a separate data store:
(Pro) Can make the system more performant if you are using a faster data store. (More advantageous in distributed environments?)
(Con) You will have to ensure consistency between the two data stores.
General notes:
You can add roles/permissions such as 'blog\123' in acl. You can also give a user permissions based on verbs such as put, delete, get, etc..
I think it is easier to create a pluggable solution that does not depend on your storage implementation. Perhaps that is why acl does not store roles in the same collections you have.
If you choose to keep the roles in your own collection, consider adding them to a token (JWT). That way, you will not have to check your collection for every request that needs authorization.
I hope that helped.

CQRS when reading with permissions for large data set

I am trying to understand how the read side of CQRS can work with a large document management application (videos/pdf files/ etc) that we are writing.
We want to show a list of all documents which the user has edit permission on (i.e. show all the documents the user can edit).There could be 10,000s of documents that a particular user could edit.
In general I have read that the a single "table" (flat structure) should suffice for most screens and with permissions you could have a table per role.
How would I design my read model to allow me to quickly get the documents that I can edit for a specific user?
Currently I can see a table holding holding my documents, another holding the users and another table that links the "editing" role between the user and the documents. So I am doing joins to get the data for this screen.
Also, there could be roles for deleting, viewing etc.
Is this the correct way in this case?
JD
You can provide a flat table that has a user id along with the respective denormalized document information.
SELECT * FROM documents_editable_by_user WHERE UserId = #UserId
SELECT * FROM documents_deletable_by_user WHERE UserId = #UserId
SELECT * FROM documents_visible_for_user WHERE UserId = #UserId
But you could even dynamically create a table/list per user in your read model store. This becomes quite easy once you switch from a SQL-based read store to NoSQL (if you haven't already.)
Especially when there are tens of thousands of documents visible for or editable by a user, flattened tables can give a real performance boost compared to joins.
When I had a read model that took the form of a filtering-search-form (pun not intended), I used rhino-security as the foundation of an authorization service.
I configured the system so that the authorization service's tables got pushed through SQL Server's pub-sub system and SQL Server Agent, to the clients that were partially displaying the denormalized data - I then let Rhino.Security join the authorization model together into the read model, on a per-user basis.
Because I essentially never wrote to the read model's authorization tables from the read model, we got a nice encapsulation on the authorization service's database and logic, because authorization was only changed through that service, and it was globally unique and specific (consistent) to that service. This meant that our custom GUIs for handling advanced (hierarchial entities, user groups, users, permissions, per-entity-permissions) authorization requirements could still do CRUD against this authorization model and that would be pushed in soft real time to any read model.

Resources