CQRS when reading with permissions for large data set - domain-driven-design

I am trying to understand how the read side of CQRS can work with a large document management application (videos/pdf files/ etc) that we are writing.
We want to show a list of all documents which the user has edit permission on (i.e. show all the documents the user can edit).There could be 10,000s of documents that a particular user could edit.
In general I have read that the a single "table" (flat structure) should suffice for most screens and with permissions you could have a table per role.
How would I design my read model to allow me to quickly get the documents that I can edit for a specific user?
Currently I can see a table holding holding my documents, another holding the users and another table that links the "editing" role between the user and the documents. So I am doing joins to get the data for this screen.
Also, there could be roles for deleting, viewing etc.
Is this the correct way in this case?
JD

You can provide a flat table that has a user id along with the respective denormalized document information.
SELECT * FROM documents_editable_by_user WHERE UserId = #UserId
SELECT * FROM documents_deletable_by_user WHERE UserId = #UserId
SELECT * FROM documents_visible_for_user WHERE UserId = #UserId
But you could even dynamically create a table/list per user in your read model store. This becomes quite easy once you switch from a SQL-based read store to NoSQL (if you haven't already.)
Especially when there are tens of thousands of documents visible for or editable by a user, flattened tables can give a real performance boost compared to joins.

When I had a read model that took the form of a filtering-search-form (pun not intended), I used rhino-security as the foundation of an authorization service.
I configured the system so that the authorization service's tables got pushed through SQL Server's pub-sub system and SQL Server Agent, to the clients that were partially displaying the denormalized data - I then let Rhino.Security join the authorization model together into the read model, on a per-user basis.
Because I essentially never wrote to the read model's authorization tables from the read model, we got a nice encapsulation on the authorization service's database and logic, because authorization was only changed through that service, and it was globally unique and specific (consistent) to that service. This meant that our custom GUIs for handling advanced (hierarchial entities, user groups, users, permissions, per-entity-permissions) authorization requirements could still do CRUD against this authorization model and that would be pushed in soft real time to any read model.

Related

Access Control in a Web Application

I'm currently reading a lot about access control possibilites/mechanisms that can be used to protect resources in an application or web application. There's ACL, RBAC, ABAC and many other concepts out there.
Assuming I have developed a simple webservice that returns knowledgebase articles on a route like '/api/article'. The 'controller' connects to the database and fetches all articles and returns them as XML or JSON.
Now I would like to have control over which article in the database is accessible for which user or group. So for instance if user 'peter' accesses the route '/api/article' with his credentials, the webservice shall return only articles that are visible for 'peter'.
I would want to use ACL to control what each user/group can read/write/delete. But what I don't quite understand:
Where does one enforce the access control? Do I just fetch all records in the controller if a user accesses the route '/api/articles' and check each single record against an access control list (that doesn't sound very good performance wise)? Or is there a way that the 'SELECT' statement to the database only return the records that can actually be seen by that specific user?
I really tried hard to find more information on that topic, and there is a lot about different access control mechanisms, but not about where and how the actual enforcement happens...and it even get's more complex if it comes to other actions like modification, deletion and so on...
This is really a matter of implementation and everyone does it its own way. It also depends on the nature of the data, particularly on the size of your authorization (do your have 5 roles and users are attached to them or does each user have a specific set of articles he can access, different for each user - for instance)
If you do not want to make the processing on the controller, you could store the authorization information in your database, in a table which links a user to a set of KB articles, or a role (which would then be reflected in the article). In that case your SELECT query would just pass the authenticated user ID. This requires that the maintenance of the relationship is done of the database, which may not be obvious.
Alternatively you can store the ACL on the controller and build a query from there - for specific articles or groups of articles.
Getting all the articles and checking them on the controller is not a good idea (as you mention), DBs have been designed also to avoid such issues.

CouchDB simple document design: need feedback

I am in the process of designing document storage for CouchDB and would really appreciate some feedback. These documents are to represent "assets".
These databases will also be synced locally to the browser via pouchdb.
Requirements:
Each user can have many assets
Users can share assets with others by providing them with a URI such as (xyz.com/some_id). Once users click this URI, they are considered to have been "joined" and are now part of a group.
Group users can share assets of their own with other members of the group.
My design
Each user will have his/her own database to store assets - let's call it "user". Each user DB will be prefixed with the his/her unique ID.
Shared assets will be stored in a separate database - let's call it "group". shared assets are DUPLICATED here and have an additional field for userId (to indicate creator).
Group database is prefixed with a unique ID just like a user database is prefixed with one too.
The reason for storing group assets in a separate database is because when pouchdb runs locally, it only knows about the current user and his/her shared assets. It does not know about other users and will should not query these "other" users' databases.
Any input would be GREATLY appreciated.
Seems like a great design. Another alternative would be to just have one database per group ("role"), and then replicate from a user's group(s) into their local PouchDB.
That might get hairy, though, when it comes time to replicate back to the server, because you're going to have to filter the documents as they leave the user's local database, depending on which group-database they belong to. Still, you're going to have to do that on the server side anyway with your current design.
Either way is fine, honestly. The only downside of your current approach is that documents are duplicated on the server side (once per user-db and once per group-db). On the other hand, your client code becomes dead-simple, because you don't have to do any filtered replication. If you have enough space on your server not to worry about it, then I would definitely go with your approach. :)

Securing the data accessed by Neo4j queries

I wish to implement security on the data contained in a Neo4j database down to the level of individual nodes and/or relationships.
Most data will be available to all users but some data will be restricted by user role. I can add either properties or labels to the data that I wish to restrict.
I want to allow users to run custom cypher queries against the data but hide any data that the user isn't authorised to see.
If I have to do something from the outside then not only do I have to filter the results returned but I also have to parse and either restrict or modify all queries that are run against the data to prevent a user from writing a query which acted on data that they aren't allowed to view.
The ideal solution would be if there is a low-level hook that allows intercepting the reads of nodes and relationships BEFORE a cypher query acts on those records. The interceptor would perform the security checks and if they fail then it would behave as though the node or relationship didn't exist at all. i.e. the same cypher query would have different results depending on who ran it. And this would apply to all possible queries e.g. count(n) not just those that returned the nodes/relationships.
Can something like this be done? If it's not supported already, is there a suitable place in the code that I could add such a security filter or would it require many code changes?
Thanks, Damon
As Chris stated, it's certainly not trivial on database level, but if you're looking for a solution on application level, you might have a look at Structr, a framework on top of and tightly integrated with Neo4j.
It provides node-level security based on ACLs, with users, groups, and different access levels. The security in Structr is implemented on the lowest level possible, f.e. we only instantiate objects if the querying user has the approriate access rights.
All higher access levels like REST API and UI see only the records available in the user's context.
[1] http://structr.org, https://github.com/structr/structr

How to construct this database in mongodb?

say i have two collections in mongodb,one for users which contains users basic info,and one for apps which contains applications.now if users are allowed to add apps,and the next time when they login, the web should get the user added app for them.how should i construct this kind of database in mongodb.
users:{_id:ObjectId(),username:'username',password:'password'}
apps:{_id:ObjectId(),appname:'',developer:,description:};
and how should i get the user added app for them??? should i add something like addedAppId like:
users:{_id:ObjectId(),username:'username',password:'password',addedApppId:[]}
to indicate which app they have added,and then get the apps using addedAppId???
Yep, there's nothing wrong with the users collection keeping track of which apps a user has added like you've indicated.
This is known as linking in MongoDB. In a relational system you'd probably create a separate table, added_apps, which had a user ID, app ID and any other relevant information. But since you can't join, keeping this information in the users collection is entirely appropriate.
From the docs:
A key question when designing a MongoDB schema is when to embed and when to link. Embedding is the nesting of objects and arrays inside a BSON document. Links are references between documents.
There are no joins in MongoDB – distributed joins would be difficult on a 1,000 server cluster. Embedding is a bit like "prejoined" data. Operations within a document are easy for the server to handle; these operations can be fairly rich. Links in contrast must be processed client-side by the application; the application does this by issuing a follow-up query.
(this is the extra bit you'd need to do, fetch app information from the user's stored AppId.)
Generally, for "contains" relationships between entities, embedding should be be chosen. Use linking when not using linking would result in duplication of data.
...
Many to many relationships are generally done by linking.

CouchDB - prevent unauthorized reads

CouchDB has a mechanism in place to prevent unauthorized writes.
Can it also prevent unauthorized reads?
Yes, CouchDB can prevent unauthorized reads. Unfortunately, it is slightly less straightforward.
Imagine a secret auction application. You bid $20 and I bid $10; each bid in a couch document. Couch lets us read our own bid documents but no others. However, there is a map-reduce view showing the average. I load the view and see that the average is $15, so I conclude that your bid is $20 and I have broken the security policy. View output can leak some or all of a document's information. It is not feasible to enforce security at the document level. That is why read access is at the database level.
I know, it sucks. But that is the only correct, scalable answer.
This is part of the reason the Couch philosophy is to create many databases—even one (or more!) per user. Read permission to a database is set in the readers value of the database _security object. (Note, the field readers was renamed to members in CouchDB trunk because it also specifies who may write to the DB.)
The technique works like this:
Create a database for each user. It will hold all documents the user may read. Add the user (or the user's role) to the _security object.
In the master database, create a filter function which implements the read policy. (It could share code with validate_doc_update.)
Replicate from the master database to the user's database with ?filter=my_filter_function.
Allow the user to load (or replicate from) their database.
Of course, this is all for a pure Couch application, where users access Couch directly. If you have a middle layer (MVC controller, or just a reverse HTTP proxy), then you can enforce policy there, between the user and the couch. But be careful. For example, a _show function or a _rewrite rule might allow a user to load a view or document despite your policy.
Good luck!

Resources