Modelling a permissions system - security

How would you model a system that handles permissions for carrying out certain actions inside an application?

Security models are a large (and open) field of research. There's a huge array of models available to choose from, ranging from the simple:
Lampson's Access control matrix lists every domain object and every principal in the system with the actions that principal is allowed to perform on that object. It is very verbose and if actually implemented in this fashion, very memory intensive.
Access control lists are a simplification of Lampson's matrix: consider it to be something akin to a sparse-matrix implementation that lists objects and principals and allowed actions, and doesn't encode all the "null" entries from Lampson's matrix. Access control lists can include 'groups' as a convenience, and the lists can be stored via object or via principal (sometimes, via program, as in AppArmor or TOMOYO or LIDS).
Capability systems are based on the idea of having a reference or pointer to objects; a process has access to an initial set of capabilities, and can get more capabilities only by receiving them from other objects on the system. This sounds pretty far-out, but think of Unix file descriptors: they are an unforgeable reference to a specific open file, and the file descriptor can be handed to other processes or not. If you give the descriptor to another process, it will have access to that file. Entire operating systems were written around this idea. (The most famous are probably KeyKOS and EROS, but I'm sure this is a debatable
point. :)
... to the more complex, which have security labels assigned to objects and principals:
Security Rings, such as implemented in Multics and x86 CPUs, among others, and provide security traps or gates to allows processes to transition between the rings; each ring has a different set of privileges and objects.
Denning's Lattice is a model of which principals are allowed to interact with which security labels in a very hierarchical fashion.
Bell-LaPadula is similar to Denning's Lattice, and provides rules to prevent leaking top-secret data to unclassified levels and common extensions provide further compartmentalization and categorization to better provide military-style 'need to know' support.
The Biba Model is similar to Bell-LaPadula, but 'turned on its head' -- Bell-LaPadula is focused on confidentiality, but does nothing for integrity, and Biba is focused on integrity, but does nothing for confidentiality. (Bell-LaPadula prevents someone from reading The List Of All Spies, but would happily allow anyone to write anything into it. Biba would happily allow anyone to read The List Of All Spies, but forbid nearly everyone to write into it.)
Type Enforcement (and its sibling, Domain Type Enforcement) provides labels on principals and objects, and specifies the allowed object-verb-subject(class) tables. This is the familiar SELinux and SMACK.
.. and then there are some that incorporate the passage of time:
Chinese Wall was developed in business settings to separate employees within an organization that provides services to competitors in a given market: e.g., once Johnson has started working on the Exxon-Mobil account, he is not allowed access to the BP account. If Johnson had started working on BP first, he would be denied access to Exxon-Mobil's data.
LOMAC and high-watermark are two dynamic approaches: LOMAC modifies the privileges of processes as they access progressively-higher levels of data, and forbids writing to lower levels (processes migrate towards "top security"), and high-watermark modifies the labels on data as higher-levels of processes access it (data migrates towards "top security").
Clark-Wilson models are very open-ended; they include invariants and rules to ensure that every state transition does not violate the invariants. (This can be as simple as double-entry accounting or as complex as HIPPA.) Think database transactions and constraints.
Matt Bishop's "Computer security: art and science" is definitely worth reading if you'd like more depth on the published models.

I prefer RBAC. Although, you can find it very similar to ACL, but they differ semantically.

Go through the following links:
http://developer.android.com/guide/topics/security/security.html
http://technet.microsoft.com/en-us/library/cc182298.aspx

Related

How to handle hard aggregate-wide constraints in DDD/CQRS?

I'm new to DDD and I'm trying to model and implement a simple CRM system based on DDD, CQRS and event sourcing to get a feel for the paradigm. I have, however, run in to some difficulties that I'm not sure how to handle. I'm not sure if my difficulties stem from me not having modeled the domain properly or that I'm missing something else.
For a basic illustration of my problems, consider that my CRM system has the aggregate CustomerAggregate (which seems reasonble to me). The purpose of this aggregate is to make sure each customer is consistent and that its invarints hold up (name is required, social security number must be on the correkct format, etc.). So far, all is well.
When the system receives a command to create a new customer, however, it needs to make sure that the social security number of the new customer doesn't already exist (i.e. it must be unique across the system). This is, of cource, not an invariant that can be enforced by the CustomerAggregate aggregate since customers don't have any information regarding other customers.
One suggestion I've seen is to handle this kind of constraint in its own aggregate, e.g. SocialSecurityNumberUniqueAggregate. If the social security number is not already registered in the system, the SocialSecurityNumberUniqueAggregate publishes an event (e.g. SocialSecurityNumberOfNewCustomerWasUniqueEvent) which the CustomerAggregate subscribes to and publishes its own event in response to this (e.g. CustomerCreatedEvent). Does this make sense? How would the CustomerAggregate respond to, for example, a missing name or another hard constraint when responding to the SocialSecurityNumberOfNewCustomerWasUniqueEvent?
The search term you are looking for is set-validation.
Relational databases are really good at domain agnostic set validation, if you can fit the entire set into a single database.
But, that comes with a cost; designing your model that way restricts your options on what sorts of data storage you can use as your book of record, and it splits your "domain logic" into two different pieces.
Another common choice is to ignore the conflicts when you are running your domain logic (after all, what is the business value of this constraint?) but to instead monitor the persisted data looking for potential conflicts and escalate to a human being if there seems to be a problem.
You can combine the two (ex: check for possible duplicates via query when running the domain logic, and monitor the results later to mitigate against data races).
But if you need to maintain an invariant over a set, and you need that to be part of your write model (rather than separated out into your persistence layer), then you need to lock the entire set when making changes.
That could mean having a "registry of SSN assignments" that is an aggregate unto itself, and you have to start thinking about how much other customer data needs to be part of this aggregate, vs how much lives in a different aggregate accessible via a common identifier, with all of the possible complications that arise when your data set is controlled via different locks.
There's no rule that says all of the customer data needs to belong to a single "aggregate"; see Mauro Servienti's talk All Our Aggregates are Wrong. Trade offs abound.
One thing you want to be very cautious about in your modeling, is the risk of confusing data entry validation with domain logic. Unless you are writing domain models for the Social Security Administration, SSN assignments are not under your control. What your model has is a cached copy, and in this case potentially a corrupted copy.
Consider, for example, a data set that claims:
000-00-0000 is assigned to Alice
000-00-0000 is assigned to Bob
Clearly there's a conflict: both of those claims can't be true if the social security administration is maintaining unique assignments. But all else being equal, you can't tell which of these claims is correct. In particular, the suggestion that "the claim you happened to write down first must be the correct one" doesn't have a lot of logical support.
In cases like these, it often makes sense to hold off on an automated judgment, and instead kick the problem to a human being to deal with.
Although they are mechanically similar in a lot of ways, there are important differences between "the set of our identifier assignments should have no conflicts" and "the set of known third party identifier assignments should have no conflicts".
Do you also need to verify that the social security number (SSN) is really valid? Or are you just interested in verifying that no other customer aggregate with the same SSN can be created in your CRM system?
If the latter is the case I would suggest to have some CustomerService domain service which performs the whole SSN check by looking up the database (e.g. via a repository) and then creates the new customer aggregate (which again checks it's own invariants as you already mentioned). This whole process - the lookup of existing SSN and customer creation - needs to happen within one transaction to to ensure consistency. As I consider this domain logic a domain service is the perfect place for it. It does not hold data by itself but orchestrates the workflow which relates to business requirements - that no to customers with the same SSN must be created in our CRM.
If you also need to verify that the social security number is real you would also need to perform some call the another service I guess or keep some cached data of SSNs in your CRM. In this case you could additonally have some SocialSecurityNumberService domain service which is injected into the CustomerService. This would just be an interface in the domain layer but the implementation of this SocialSecurityNumberService interface would then reside in the infrastructure layer where the access to whatever resource required is implemented (be it a local cache you build in the background or some API call to another service).
Either way all your logic of creating the new customer would be in one place, the CustomerService domain service. Additional checks that go beyond the Customer aggregate boundaries would also be placed in this CustomerService.
Update
To also adhere to the nature of eventual consistency:
I guess as you go with event sourcing you and your business already accepted the eventual consistency nature. This also means entries with the same SSN could happen. I think you could have some background job which continually checks for duplicate entries and depending on the complexity of your business logic you might either be able to automatically correct the duplicates or you need human intervention to do it. It really depends how often this could really happen.
If a hard constraint is that this must NEVER happen maybe event sourcing is not the right way, at least for this part of your system...
Note: I also assume that command de-duplication is not the issue here but that you really have to deal with potentially different commands using the same SSN.

Should a single command address multiple aggregates?

In CQRS and DDD, an aggregate is a transactional boundary. Hence I have been modeling commands always in such a way that each command always only ever addresses a single aggregate. Of course, technically, it would be possible to write a command handler that addresses multiple aggregates, but that would not be within a single transaction and hence would not be consistent.
If you actually have to address multiple aggregates, I usually go with a process manager, but this sometimes feels like pretty much overhead. In addition, from my understanding a process manager always only reacts to domain events, it is not directly addressed by commands. So you need to decide which aggregate to put the starting point to.
I have seen that some people solve this using so-called domain or application services, which can receive commands as well, and then work on multiple aggregates – but in this case the transactional nature of the process gets lost.
To give a simple example, to better illustrate the scenario:
A user shall join a group.
A user has a max number of groups.
A group has a max number of users.
Where to put the command that triggers the initial joining process, and what to call it? user.join(group) feels as right or wrong as group.welcome(user). I'd probably go for the first one, because this is closer to the ubiquitous language, but anyway…
If I had something above the aggregates, like the aforementioned services, then I could run something such as:
userManagement.addUserToGroup(user, group);
However, this addUserToGroup function would then need to call both commands, which in turn means it has to take care of both commands being processed – which is somewhat counterintuitive to having separate aggregates at all, and having aggregates as transactional boundaries.
What would be the correct way to model this?
It may be worth reviewing Greg Young on Eventual Consistency and Set Validation.
What is the business impact of having a failure
This is the key question we need to ask and it will drive our solution
in how to handle this issue as we have many choices of varying degrees
of difficulty.
And certainly Pat Helland on Memories, Guesses, and Apologies.
Short version: the two generals tell us that, if two pieces of information must be consistent, then we need to write both pieces of information in the same place. The "invariant" constrains our data model.
The invariant you describe is effectively a couple of set validation problems: the "membership" collection allows only so many members with user A, and only so many members with group B. And if you really are in a "we go out of business if those rules are violated" situation, then you cannot distribute the members of that set -- you have to lock the entire set when you modify it to ensure that the rule is not broken and that first writer wins.
An element that requires some care in your modeling: is the domain model the authority for membership? or is the "real world" responsible for membership and the domain model is just caching that information for later use? You want to be very careful about trying to enforce an invariant on the real world.
There's a risk that you end up over constraining the order in which information is accepted by the model.
Essentially what you have is many to many relationships between users and groups with restrictions on both sides:
Restriction on the number of groups a user can join
Restriction on the number of users a group can have
VoiceOfUnreason already gave a great answer, so I'll share one way I've solved similar problems and go straight to the model and implementation in case you have to ensure that these constraints are enforced at all costs. If you don't have to, do not make the model and implementation that complex.
Ensuring consistency with such constraints on both Group and User entities will be difficult in a single operation because of the concurrency of the operations.
You can model this by adding a collection of RegisteredUsers to a Group or vice versa, adding a collection of JoinedGroups to a User, and enforce the constraint on one side, but enforcing it on the other side is still an issue.
What you can do is introduce another concept in your domain. The concept of a
"Slot" in a Group. "Slots" are limited by the max number of Slots for a Group.
Then a User will issue a JoinGroupRequest that can be Accepted or Rejected.
A Slot can be either Taken or Reserved. Then you can introduce the concept of SlotReservation. The process of joining a User to a Group will be:
Issue a JoinGroupRequest from a User
Try to Reserve a Slot enforcing the MaxUsersPerGroup constraint.
Acquire a Slot or Reject the SlotReservation of a User enforcing the MaxGroupsPerUser constraint.
Accept or Reject the JoinGroupRequest depending on the outcome of the SlotReservation
If the SlotReservation is Rejected, another User will be able to use this Slot later.
For the implementation, you can add SlotReservation Queue Per Group to ensure that once a Slot is free after a Rejected SlotReservation, the next User that wants to join the Group will be able to.
For the implementation, you can add a collection of Slots to a Group, or you can make Slot an aggregate in its own right.
You can use a Saga for this process. The Saga will be triggered when a JoinGroupRequest is made by a User.
Essentially, this operation becomes a Tentative Operation.
For more details take a look and the Accountability Pattern and Life beyond distributed transactions an apostate's opinion and Life beyond distributed transactions an apostate's implementation.

Where put a common value object between two bounded contexts?

i'm trying to apply the domain driven design in a little project
i want to separate the authentication of rest of the users logic,
i'm using a value object for the id (userID) in the beginning i have the userID in the same level(package) of the users but I realize when i separate bounded context for authentication and user, they share a common value object for the userID, so my question is where supose that i have to put the common or share value objects? is correct if i created a packages called commons?
It is recommended to not share models between bounded contexts, however, you can share IDs and even simple Value objects (like Money).
The general rule is that you may share anything that is immutable or that changes very infrequently and IDs rarely change structure (here immutability refers to the source code and value immutability).
Packages called "common" usually refers to reusable concepts that you use in your projects, so that you don't have to code them in every project. There it's usual to put base abstract objects and interfaces, for entities, value objects, etc.
But it's not your case about userId. What you are saying is to put userId in a "shared kernel" model. It is an option, but usually it is not recommended.
From my point of view:
The auth BC has the concepts id,login,password,role,etc.
The user BC has concepts like name,surname,address,age,etc, but it also has to have the id, that it gets from auth BC.
From what I know, you have 2 options:
(1) Authentication BC shares "id" concept explicitly, i.e. it has a shared kernel. So "id" concept would belong to user BC model too.
(2) Authentication BC is a generic BC. So you have to integrate the user BC with the auth BC to get the id from it.
First of all, I personally see this as an context mapping question at code level, where you are effectively looking to have a shared kernel between multiple bounded contexts.
This depends on the business really. You typically need to consider the following questions:
How do different teams responsible for each bounded context collaborate with each other? If it's 1 team maintaining all the related bounded contexts, it might be OK, but it's multiple teams have different objectives, it leads to the next point.
How do you plan to maintain this shared kernel over time? In general, this shared kernel will need to change to adapt to business changes.
For more detailed arguments, there are plenty of resources out there about context mapping, such as Vaughn Vernon's book "Implementing Domain Driven Design".

What persistence problems are solved with CQRS?

I've read a few posts relating to this, but i still can't quite grasp how it all works.
Let's say for example i was building a site like Stack Overflow, with two pages => one listing all the questions, another where you ask/edit a question. A simple, CRUD-based web application.
If i used CQRS, i would have a seperate system for the read/writes, seperate DB's, etc..great.
Now, my issue comes to how to update the read state (which is, after all in a DB of it's own).
Flow i assume is something like this:
WebApp => User submits question
WebApp => System raises 'Write' event
WriteSystem => 'Write' event is picked up and saves to 'WriteDb'
WriteSystem => 'UpdateState' event raised
ReadSystem => 'UpdateState' event is picked up
ReadSystem => System updates it's own state ('ReadDb')
WebApp => Index page reads data from 'Read' system
Assuming this is correct, how is this significantly different to a CRUD system read/writing from same DB? Putting aside CQRS advantages like seperate read/write system scaling, rebuilding state, seperation of domain boundaries etc, what problems are solved from a persistence standpoint? Lock contention avoided?
I could achieve a similar advantage by either using queues to achieve single-threaded saves in a multi-threaded web app, or simply replicate data between a read/write DB, could i not?
Basically, I'm just trying to understand if i was building a CRUD-based web application, why i would care about CQRS, from a pragmatic standpoint.
Thanks!
Assuming this is correct, how is this significantly different to a CRUD system read/writing from same DB? Putting aside CQRS advantages like seperate read/write system scaling, rebuilding state, seperation of domain boundaries etc, what problems are solved from a persistence standpoint? Lock contention avoided?
The problem here is:
"Putting aside CQRS advantages …"
If you take away its advantages, it's a little bit difficult to argue what problems it solves ;-)
The key in understanding CQRS is that you separate reading data from writing data. This way you can optimize the databases as needed: Your write database is highly normalized, and hence you can easily ensure consistency. Your read database in contrast is denormalized, which makes your reads extremely simple and fast: They all become SELECT * FROM … effectively.
Under the assumption that a website as StackOverflow is way more read from than written to, this makes a lot of sense, as it allows you to optimize the system for fast responses and a great user experience, without sacrificing consistency at the same time.
Additionally, if combined with event-sourcing, this approach has other benefits, but for CQRS alone, that's it.
Shameless plug: My team and I have created a comprehensive introduction to CQRS, DDD and event-sourcing, maybe this helps to improve understanding as well. See this website for details.
A good starting point would be to review Greg Young's 2010 essay, where he tries to clarify the limited scope of the CQRS pattern.
CQRS is simply the creation of two objects where there was previously only one.... This separation however enables us to do many interesting things architecturally, the largest is that it forces a break of the mental retardation that because the two use the same data they should also use the same data model.
The idea of multiple data models is key, because you can now begin to consider using data models that are fit for purpose, rather than trying to tune a single data model to every case that you need to support.
Once we have the idea that these two objects are logically separate, we can start to consider whether they are physically separate. And that opens up a world of interesting trade offs.
what problems are solved from a persistence standpoint?
The opportunity to choose fit for purpose storage. Instead of supporting all of your use cases in your single read/write persistence store, you pull documents out of the key value store, and run graph queries out of the graph database, and full text search out of the document store, events out of the event stream....
Or not! if the cost benefit analysis tells you the work won't pay off, you have the option of serving all of your cases from a single store.
It depends on your applications needs.
A good overview and links to more resources here: https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs
When to use this pattern:
Use this pattern in the following situations:
Collaborative domains where multiple operations are performed in parallel on the same data. CQRS allows you to define commands with
enough granularity to minimize merge conflicts at the domain level
(any conflicts that do arise can be merged by the command), even when
updating what appears to be the same type of data.
Task-based user interfaces where users are guided through a complex process as a series of steps or with complex domain models.
Also, useful for teams already familiar with domain-driven design
(DDD) techniques. The write model has a full command-processing stack
with business logic, input validation, and business validation to
ensure that everything is always consistent for each of the aggregates
(each cluster of associated objects treated as a unit for data
changes) in the write model. The read model has no business logic or
validation stack and just returns a DTO for use in a view model. The
read model is eventually consistent with the write model.
Scenarios where performance of data reads must be fine tuned separately from performance of data writes, especially when the
read/write ratio is very high, and when horizontal scaling is
required. For example, in many systems the number of read operations
is many times greater that the number of write operations. To
accommodate this, consider scaling out the read model, but running the
write model on only one or a few instances. A small number of write
model instances also helps to minimize the occurrence of merge
conflicts.
Scenarios where one team of developers can focus on the complex domain model that is part of the write model, and another team can
focus on the read model and the user interfaces.
Scenarios where the system is expected to evolve over time and might contain multiple versions of the model, or where business rules
change regularly.
Integration with other systems, especially in combination with event sourcing, where the temporal failure of one subsystem shouldn't
affect the availability of the others.
This pattern isn't recommended in the following situations:
Where the domain or the business rules are simple.
Where a simple CRUD-style user interface and the related data access operations are sufficient.
For implementation across the whole system. There are specific components of an overall data management scenario where CQRS can be
useful, but it can add considerable and unnecessary complexity when it
isn't required.

Integration between various Domain Driven Design systems

I have recently been adopting Domain Driven Design principles, but I'm having a bit of trouble implementing Bounded Contexts and the integration between the contexts and/or other systems.
For example, take the following systems:
Warehouse/Stock Keeping System
Entities would include 'Product' which would have properties such as 'Quantity', 'Location'
Online Ordering System
Entities would include 'Order', 'OrderLine', and 'Basket'. Would it also have its own Product entity which would have properties such as 'Price'?
One clear business rule for the Ordering System is that an order cannot be placed for a product that is out of stock, but this information is within the Stock Keeping System. From what I understand, these are some possible ways of implementing this:
When the order is validated, the Order object calls a service in the Stock Keeping System to check there is enough stock of each required product. However, something does not feel right about the domain calling an application service of another system, and also if all systems were doing this it would result in everything being closely coupled an intertwined.
The Ordering System reads from the database of the Stock Keeping System: The Product entity in the Ordering System is mapped to a join of the Product table in the Ordering System and the Product table in the Stock Keeping System, or the Ordering System Product entity contains another entity called StockKeepingProduct which has the values from the Stock Keeping System. This would be easy to perform the validation on but it would have to be ensured the Ordering System never writes to the Stock Keeping System's database.
The quantity of stock is denormalised into the Ordering System's database and whenever the Stock Keeping System's stock changes it sends out a message to the Ordering System to update its stock.
Probably deep down I know I should be doing 3, but I'm not sure we're quite ready to be handling so much redundant dat and possible inconsistencies. What are your opinions on 1 and 2? Or do you have any other suggestions?
It also all depends on the infrastructure. If you have 2 systems running in one network hence there is minimal chance of communication disruptions, I don't see problem with solution 1. You can wrap the call to Stock Keeping System into adapters and easily interchange in future in case you decide to change the API of Stock Keeping System or the system completely as such. In also avoids leaking of its details into Ordering System.
Solution 3 is more advanced, requires more resources to implement and maintain, but allows for complete separation of those 2 systems. Better handles network disruption or performance bottlenecks such in case Ordering system needs to handle more requests than Stock Keeping System can handle.
But again it could be implemented same way as 1) - separated using Adapters. IMHO from DDD perspective no difference.

Resources