CouchDB - prevent unauthorized reads - couchdb

CouchDB has a mechanism in place to prevent unauthorized writes.
Can it also prevent unauthorized reads?

Yes, CouchDB can prevent unauthorized reads. Unfortunately, it is slightly less straightforward.
Imagine a secret auction application. You bid $20 and I bid $10; each bid in a couch document. Couch lets us read our own bid documents but no others. However, there is a map-reduce view showing the average. I load the view and see that the average is $15, so I conclude that your bid is $20 and I have broken the security policy. View output can leak some or all of a document's information. It is not feasible to enforce security at the document level. That is why read access is at the database level.
I know, it sucks. But that is the only correct, scalable answer.
This is part of the reason the Couch philosophy is to create many databases—even one (or more!) per user. Read permission to a database is set in the readers value of the database _security object. (Note, the field readers was renamed to members in CouchDB trunk because it also specifies who may write to the DB.)
The technique works like this:
Create a database for each user. It will hold all documents the user may read. Add the user (or the user's role) to the _security object.
In the master database, create a filter function which implements the read policy. (It could share code with validate_doc_update.)
Replicate from the master database to the user's database with ?filter=my_filter_function.
Allow the user to load (or replicate from) their database.
Of course, this is all for a pure Couch application, where users access Couch directly. If you have a middle layer (MVC controller, or just a reverse HTTP proxy), then you can enforce policy there, between the user and the couch. But be careful. For example, a _show function or a _rewrite rule might allow a user to load a view or document despite your policy.
Good luck!

Related

Access Control in a Web Application

I'm currently reading a lot about access control possibilites/mechanisms that can be used to protect resources in an application or web application. There's ACL, RBAC, ABAC and many other concepts out there.
Assuming I have developed a simple webservice that returns knowledgebase articles on a route like '/api/article'. The 'controller' connects to the database and fetches all articles and returns them as XML or JSON.
Now I would like to have control over which article in the database is accessible for which user or group. So for instance if user 'peter' accesses the route '/api/article' with his credentials, the webservice shall return only articles that are visible for 'peter'.
I would want to use ACL to control what each user/group can read/write/delete. But what I don't quite understand:
Where does one enforce the access control? Do I just fetch all records in the controller if a user accesses the route '/api/articles' and check each single record against an access control list (that doesn't sound very good performance wise)? Or is there a way that the 'SELECT' statement to the database only return the records that can actually be seen by that specific user?
I really tried hard to find more information on that topic, and there is a lot about different access control mechanisms, but not about where and how the actual enforcement happens...and it even get's more complex if it comes to other actions like modification, deletion and so on...
This is really a matter of implementation and everyone does it its own way. It also depends on the nature of the data, particularly on the size of your authorization (do your have 5 roles and users are attached to them or does each user have a specific set of articles he can access, different for each user - for instance)
If you do not want to make the processing on the controller, you could store the authorization information in your database, in a table which links a user to a set of KB articles, or a role (which would then be reflected in the article). In that case your SELECT query would just pass the authenticated user ID. This requires that the maintenance of the relationship is done of the database, which may not be obvious.
Alternatively you can store the ACL on the controller and build a query from there - for specific articles or groups of articles.
Getting all the articles and checking them on the controller is not a good idea (as you mention), DBs have been designed also to avoid such issues.

CouchDB simple document design: need feedback

I am in the process of designing document storage for CouchDB and would really appreciate some feedback. These documents are to represent "assets".
These databases will also be synced locally to the browser via pouchdb.
Requirements:
Each user can have many assets
Users can share assets with others by providing them with a URI such as (xyz.com/some_id). Once users click this URI, they are considered to have been "joined" and are now part of a group.
Group users can share assets of their own with other members of the group.
My design
Each user will have his/her own database to store assets - let's call it "user". Each user DB will be prefixed with the his/her unique ID.
Shared assets will be stored in a separate database - let's call it "group". shared assets are DUPLICATED here and have an additional field for userId (to indicate creator).
Group database is prefixed with a unique ID just like a user database is prefixed with one too.
The reason for storing group assets in a separate database is because when pouchdb runs locally, it only knows about the current user and his/her shared assets. It does not know about other users and will should not query these "other" users' databases.
Any input would be GREATLY appreciated.
Seems like a great design. Another alternative would be to just have one database per group ("role"), and then replicate from a user's group(s) into their local PouchDB.
That might get hairy, though, when it comes time to replicate back to the server, because you're going to have to filter the documents as they leave the user's local database, depending on which group-database they belong to. Still, you're going to have to do that on the server side anyway with your current design.
Either way is fine, honestly. The only downside of your current approach is that documents are duplicated on the server side (once per user-db and once per group-db). On the other hand, your client code becomes dead-simple, because you don't have to do any filtered replication. If you have enough space on your server not to worry about it, then I would definitely go with your approach. :)

Most efficient way to determine CouchDB access permission

I'm using the CouchDB permission system with per-db-and-user access rights. Each DB represents an app, which are being displayed in a home-screen-like overview and in other places. I need an efficient way to make CouchDB tell me whether a user has access to a db or not - for example a GET /_all_dbs that only returns the DBs for which current user has access. Polling a view or document turns out to be too slow once there are more than a dozen or so apps to display on one page, although I could still tune a view poll with limit=1. Isn't there a better way though?
Query the _security document of the database.
curl http://localhost:5984/db_name/_security
{"admins":{"names":["dbadmin"],"roles":["reader"]},"members":{"names":[],"roles":[]}}
For every database that has admins/users couchdb has a creates a special document called _security that holds a list of all the users for that database. You can make a curl request to that document and get an array that will give you all the admins and members for that database.
Edit
You know your application best but here is a strategy that I think could be helpful? Every couchdb user is stored in the _users database. It is just like any other database. You can create a view on it and then query it. You can even add additional fields to the documents to help with querying. How about when you create a user on a database you update the corresponding document in the _users database as well.
Now if you call _users/_all_docs?include_docs=true you get a list of users along with the databases they have access to. One request and you have everything you need.

Should I validate access permissions each request?

I've always wondered whether it's better to check the database for account access permissions every single request, or cache (say, an ACL) in the session state.
My current case isn't particularly mission-critical, but I feel it would be annoying to have to logout and log back in to refresh cached credentials. I've also considered using a temporary data store, with a TTL. Seems like it might be the best of both.
Security wise, it is better to check the DB every time for permissions. The security vulnerability comes in that if the user's permission are reduced after the session is created, they could potentially still be achieving a higher level of access than they should.
There are a few things you can do to stay secure without performing a full query, provided you're early enough in the development cycle. If you have role-based access control (RBAC), you can store a fast lookup table that contains a user's role. If the user's role changes during the session, you mark the permissions "dirty" in the lookup table, causing a querying of the DB for the new role. As long as the user's role stays the same, there's no need to query the DB. The lookup table then, is basically just a flag that you can set on the backend if the user's role changes. This same technique can be used even with individual access controls, provided the granularity is not too fine. If it is, it starts to become a bloat on your server. We use this technique at work to speed up transactions.
If you are late in the development cycle or if you value simplicity more than performance (simple is usually more secure), then I would query the DB every time unless the load gets too heavy for the DB.

CQRS when reading with permissions for large data set

I am trying to understand how the read side of CQRS can work with a large document management application (videos/pdf files/ etc) that we are writing.
We want to show a list of all documents which the user has edit permission on (i.e. show all the documents the user can edit).There could be 10,000s of documents that a particular user could edit.
In general I have read that the a single "table" (flat structure) should suffice for most screens and with permissions you could have a table per role.
How would I design my read model to allow me to quickly get the documents that I can edit for a specific user?
Currently I can see a table holding holding my documents, another holding the users and another table that links the "editing" role between the user and the documents. So I am doing joins to get the data for this screen.
Also, there could be roles for deleting, viewing etc.
Is this the correct way in this case?
JD
You can provide a flat table that has a user id along with the respective denormalized document information.
SELECT * FROM documents_editable_by_user WHERE UserId = #UserId
SELECT * FROM documents_deletable_by_user WHERE UserId = #UserId
SELECT * FROM documents_visible_for_user WHERE UserId = #UserId
But you could even dynamically create a table/list per user in your read model store. This becomes quite easy once you switch from a SQL-based read store to NoSQL (if you haven't already.)
Especially when there are tens of thousands of documents visible for or editable by a user, flattened tables can give a real performance boost compared to joins.
When I had a read model that took the form of a filtering-search-form (pun not intended), I used rhino-security as the foundation of an authorization service.
I configured the system so that the authorization service's tables got pushed through SQL Server's pub-sub system and SQL Server Agent, to the clients that were partially displaying the denormalized data - I then let Rhino.Security join the authorization model together into the read model, on a per-user basis.
Because I essentially never wrote to the read model's authorization tables from the read model, we got a nice encapsulation on the authorization service's database and logic, because authorization was only changed through that service, and it was globally unique and specific (consistent) to that service. This meant that our custom GUIs for handling advanced (hierarchial entities, user groups, users, permissions, per-entity-permissions) authorization requirements could still do CRUD against this authorization model and that would be pushed in soft real time to any read model.

Resources