I wish to implement security on the data contained in a Neo4j database down to the level of individual nodes and/or relationships.
Most data will be available to all users but some data will be restricted by user role. I can add either properties or labels to the data that I wish to restrict.
I want to allow users to run custom cypher queries against the data but hide any data that the user isn't authorised to see.
If I have to do something from the outside then not only do I have to filter the results returned but I also have to parse and either restrict or modify all queries that are run against the data to prevent a user from writing a query which acted on data that they aren't allowed to view.
The ideal solution would be if there is a low-level hook that allows intercepting the reads of nodes and relationships BEFORE a cypher query acts on those records. The interceptor would perform the security checks and if they fail then it would behave as though the node or relationship didn't exist at all. i.e. the same cypher query would have different results depending on who ran it. And this would apply to all possible queries e.g. count(n) not just those that returned the nodes/relationships.
Can something like this be done? If it's not supported already, is there a suitable place in the code that I could add such a security filter or would it require many code changes?
Thanks, Damon
As Chris stated, it's certainly not trivial on database level, but if you're looking for a solution on application level, you might have a look at Structr, a framework on top of and tightly integrated with Neo4j.
It provides node-level security based on ACLs, with users, groups, and different access levels. The security in Structr is implemented on the lowest level possible, f.e. we only instantiate objects if the querying user has the approriate access rights.
All higher access levels like REST API and UI see only the records available in the user's context.
[1] http://structr.org, https://github.com/structr/structr
Related
I'm currently reading a lot about access control possibilites/mechanisms that can be used to protect resources in an application or web application. There's ACL, RBAC, ABAC and many other concepts out there.
Assuming I have developed a simple webservice that returns knowledgebase articles on a route like '/api/article'. The 'controller' connects to the database and fetches all articles and returns them as XML or JSON.
Now I would like to have control over which article in the database is accessible for which user or group. So for instance if user 'peter' accesses the route '/api/article' with his credentials, the webservice shall return only articles that are visible for 'peter'.
I would want to use ACL to control what each user/group can read/write/delete. But what I don't quite understand:
Where does one enforce the access control? Do I just fetch all records in the controller if a user accesses the route '/api/articles' and check each single record against an access control list (that doesn't sound very good performance wise)? Or is there a way that the 'SELECT' statement to the database only return the records that can actually be seen by that specific user?
I really tried hard to find more information on that topic, and there is a lot about different access control mechanisms, but not about where and how the actual enforcement happens...and it even get's more complex if it comes to other actions like modification, deletion and so on...
This is really a matter of implementation and everyone does it its own way. It also depends on the nature of the data, particularly on the size of your authorization (do your have 5 roles and users are attached to them or does each user have a specific set of articles he can access, different for each user - for instance)
If you do not want to make the processing on the controller, you could store the authorization information in your database, in a table which links a user to a set of KB articles, or a role (which would then be reflected in the article). In that case your SELECT query would just pass the authenticated user ID. This requires that the maintenance of the relationship is done of the database, which may not be obvious.
Alternatively you can store the ACL on the controller and build a query from there - for specific articles or groups of articles.
Getting all the articles and checking them on the controller is not a good idea (as you mention), DBs have been designed also to avoid such issues.
While GraphQL mentions security should be delegated to underlying business logic, the nature of GraphQL lends itself very well to security.
In GraphQL the Query can have a resolve method, also each field can have a resolve method. In a way we are traversing the graph, if we provide resolvers for each query and all fields of their results.
Now "Attribute Based Access Control" is gaining popularity with its ways to define security policy across
Subject
Resource
Action
Environment
One way "Attribute Based Access Control" is implemented is, that it modifies the query being fired to only fetch eligible data. This could be done by a wrapper resolver.
Second way "Attribute Based Access Control" can be implement in GraphQL, is to use field level resolvers to decide whether to expose that field or not.
The question I have to the community is what are the various ways to implement "Attribute Based Access Control" in GraphQL, especially leveraging the strengths of GraphQL
Cheers,
Rohit
There are 2 ways ABAC could be used to secure data - be it GraphQL; SQL; HQL... - like you say:
Either you modify the incoming query so that the modified query only retrieves the entitled data. This is for instance how some database proxies work. It intercepts 'SELECT a, b, c FROM t' and converts it into 'SELECT a, b, c FROM t WHERE...' Axiomatics does that with its Data Access Filter.
Or you configure the underlying system so that it only allows access under the right circumstances. We call that provisioning. Years ago, for instance, MySQL had a feature called FGAC - fine-grained access control that could be used to that effect.
The benefit of 1. is that it is unintrusive. It sits in front of the data source and could in principle work for several types of data sources e.g. SQL, GraphQL... The benefit of 2. is that you do not need the proxy component and the configuration is native to the target system.
In any case, yes Graph databases lend themselves really well to ABAC because of the relationship between the different entities. In a way, relational databases have that too but perhaps not as obvious.
Consider a typical Breeze controller that limits the results of a query to entities that the logged in user has access to. When the browser calls SaveChanges, does Breeze verify on the server that the entities reported as modified are from the original set?
To put it another way, does the EFContextProvider (in the case Entity Framework) keep track of entities that have been handed out, so it can check against malicious data passed to SaveChanges? Or does BeforeSaveEntity need to validate that the user has access to the changed entities?
You must guard against malicious data in your BeforeSaveEntity or BeforeSaveEntities methods.
The idea that the EFContextProvider would keep track of entities that have already been handed out is probably something that we would NOT want to do because
The EFContextProvider would no longer be stateless, which was a design goal to facilitate scaling.
You would still need to guard against malicious data for "Added" entities in the BeforeXXX methods.
It is actually a valid use case for some of our users to "modify" entities without having first queried them.
I am using nodejs, and have been researching acl/authorization for the past week. I have found only a couple, but none seem to have all the features I require. The closest has been https://github.com/OptimalBits/node_acl, but I don't think it supports protecting resources by id (for example, if I wanted to allow user 12345 and only user 12345 to access user/12345/edit). Hence, I think I will have to make a custom acl solution for myself.
My question regarding this is, what are some pros and cons to storing roles (user, admin, moderator, etc.) under each user object, as opposed to creating another collection/table that maps each user with their authorization rules? node_acl uses a separate collection, whereas most of the other ones depend on the roles array in user objects.
By the way, I am using Mongodb at the moment. However I have not researched the pros and cons yet of using relational vs. nonrelational databases for authentication yet, so if let me know if your answer depends on that.
As I was typing this up, I thought of one thing. If I store roles in a separate collection, it is more portable. I would be able to swap out the acl system much more easily. (I think?)
The question here seems like it could be abstracted from "where should I store my roles" to "how should I store related information in Mongo (or NoSQL in general)". It's a relation vs non-relational modeling issue.
Non-Relational
Using Node + Mongo, storing the roles on the user will make it really easy to determine if a user has access to the feature, given that you can just look in the 'roles' property. The trade off is that you have lots of duplicate information ('user_read' could be a role on every user account) and if you end up changing that property, you'll need to update it inside every user object.
You could store the roles in their own collection and then store the id for that entry in the Roles collection on your User model, but then you'll still need to fetch the actual record from the collection to display any of it's information (though arguably this could be a rare occurrence)
Relational
Storing these in a relational DB would be a more "traditional" approach in that you can establish the relationships between the tables (via FKs / join tables or what not). This can be a good solution, but then you no longer have the benefits of using a NoSQL database.
Summary
If the rest of your app is stored in Mongo and has to stay there (for performance or whatever constraint) then you are probably better off doing it all in Mongo. Most of the advice I've come across says don't mix & match data stores, e.g. use one or the other, but not both. That being said, I've done projects with both and it can get messy but sometimes the pros outweigh the cons.
I like #DavidWelch answer, but I'd like to tackle the question from another perspective because the library mentioned gives the option to use a different data store entirely.
Storing roles in a separate data store:
(Pro) Can make the system more performant if you are using a faster data store. (More advantageous in distributed environments?)
(Con) You will have to ensure consistency between the two data stores.
General notes:
You can add roles/permissions such as 'blog\123' in acl. You can also give a user permissions based on verbs such as put, delete, get, etc..
I think it is easier to create a pluggable solution that does not depend on your storage implementation. Perhaps that is why acl does not store roles in the same collections you have.
If you choose to keep the roles in your own collection, consider adding them to a token (JWT). That way, you will not have to check your collection for every request that needs authorization.
I hope that helped.
I am trying to understand how the read side of CQRS can work with a large document management application (videos/pdf files/ etc) that we are writing.
We want to show a list of all documents which the user has edit permission on (i.e. show all the documents the user can edit).There could be 10,000s of documents that a particular user could edit.
In general I have read that the a single "table" (flat structure) should suffice for most screens and with permissions you could have a table per role.
How would I design my read model to allow me to quickly get the documents that I can edit for a specific user?
Currently I can see a table holding holding my documents, another holding the users and another table that links the "editing" role between the user and the documents. So I am doing joins to get the data for this screen.
Also, there could be roles for deleting, viewing etc.
Is this the correct way in this case?
JD
You can provide a flat table that has a user id along with the respective denormalized document information.
SELECT * FROM documents_editable_by_user WHERE UserId = #UserId
SELECT * FROM documents_deletable_by_user WHERE UserId = #UserId
SELECT * FROM documents_visible_for_user WHERE UserId = #UserId
But you could even dynamically create a table/list per user in your read model store. This becomes quite easy once you switch from a SQL-based read store to NoSQL (if you haven't already.)
Especially when there are tens of thousands of documents visible for or editable by a user, flattened tables can give a real performance boost compared to joins.
When I had a read model that took the form of a filtering-search-form (pun not intended), I used rhino-security as the foundation of an authorization service.
I configured the system so that the authorization service's tables got pushed through SQL Server's pub-sub system and SQL Server Agent, to the clients that were partially displaying the denormalized data - I then let Rhino.Security join the authorization model together into the read model, on a per-user basis.
Because I essentially never wrote to the read model's authorization tables from the read model, we got a nice encapsulation on the authorization service's database and logic, because authorization was only changed through that service, and it was globally unique and specific (consistent) to that service. This meant that our custom GUIs for handling advanced (hierarchial entities, user groups, users, permissions, per-entity-permissions) authorization requirements could still do CRUD against this authorization model and that would be pushed in soft real time to any read model.