How to handle concurrent constraints across aggregate roots - domain-driven-design

I'm afraid I already know the answer, but I'm hoping that somebody can provide an alternative solution that haven't found before. As always doing DDD according to Effective Aggregate Design is more difficult than I thought, but here's my scenario.
We have two ARs, User and RoleGroup
A User can be granted a particular RoleGroup and thereby obtains the permissions provided by the Roles (a collection value object) in that role group. The identity of the role group is kept in the User AR as another VA.
When a RoleGroup is removed from the system, we raise a domain event that a handler uses to find all users referring to that RoleGroup and to remove the reference. The corresponding projection denormalizer will use that same event to update the effective roles of the User. This is a combination of the individual roles granted to that User and the roles of all granted RoleGroups.
This doesn't have to be transactional (iow it can be eventually consistent).
We use Event Sourcing using Jonathan Oliver's EventStore 3.0 and elements from Lokad.CQRS and NCQRS.
So, in theory, when one request (it's an ASP.NET MVC app) is executing the scenario mentioned above, it is possible that another request is granting that same RoleGroup to a User. If that happens just after the above mentioned domain event handler scans for users related to that RoleGroup, that request will complete. At that point you have a RoleGroup that is deleted (albeit not physically) and a User that is still holding the identity of that RoleGroup.
How do you prevent this? We're currently looking at making the identity of the Users granted a particular RoleGroup part of that RoleGroup AR, so that deleting a RoleGroup and granting it to a user will cause a optimistic concurrency conflict. But somehow, this doesn't feel like the correct solution.

This is similar to how uniqueness constraints CAN be solved.
Suppose there's a projection with both rolegroups and users that has SERIAL behavior. When rolegroups get archived (i.e. they can no longer be used), the reactive bits sitting on top of the projection can notify all the users that have been granted said rolegroup that they are no longer part of it. When concurrently this archived rolegroup is granted to a user (or a set of), the serial nature of the projection can be leveraged to tell this user too that they are no longer part of the group.
All that said, this is just housekeeping. It's only when the rolegroups and users get used that a correct view is important. Since I presume rolegroups will carry an IsArchived bit, I can safely filter them out at that time, without worrying about some dangling edge-case for which we still have to prove that it has to be resolved in an automated way.
As an aside, scanning the event log would also reveal this situation, i.e. are there any users granted a rolegroup that was archived before that point in time (or around that point in time)? An admin could resolve this by issueing a compensating command to the user aggregate.
"It depends" TM
Edit: I've given a technical solution to this problem. I would encourage other readers to explore different ways of modeling & solving these kinds of problems. Sometimes, perhaps even most of the times, the answer isn't technical at all. YMMV.

Why are you adding a RoleGroup reference to the User aggregate? Are there any invariants on the User that use this information?
I imagine this can be modelled much simpler by granting the RoleGroup to the User via the RoleGroup aggregate, emitting something like a RoleGroupGrantedToUser event. When a RoleGroup is removed it emits a RoleGroupRemoved event. After this event the RoleGroup no longer accepts new Users.

To prevent this you can make it transactional i.e. make it impossible to grant a deleted RoleGroup to a User by some locking mechanism. But that would only complicate things and, as you noted, is not required.
When you assign a RoleGroup to a User, I suppose you have a similar projection denormalizer to update the user's effective roles as well. You could check there if a granted RoleGroup still exists at that point, and if it doesn't, remove the reference from the User. Then the effective roles on a user should eventually be consistent.

Related

How to handle relationships between aggregate roots in resolvejs

I'm having trouble figuring out how to handle some basic stuff around relationships between aggregate roots in resolvejs. The basic question is how do I handle the integrity of relationships? To do so, it seems like you need knowledge of both sides at the same time, but that doesn't seem to be allowed in the write side.
Heres the setup: I am trying to build a user management tool and I have two aggregate roots, User and Organisation. I need to allow both to exist independently of each other and define an access relationship between them (i.e. a user can have access to any number of organisation).
If the relationship belongs to the User, I can create a create a command like grantAccessToOrganisation on the User aggregate which takes an organisationId but this raises a couple questions. How do I make sure that the organisationId provided is real one? It seems like it needs to happen in the command handler but since the command belongs to the User aggregate, I don't have access to the Organisation aggregate. Also, how should I handle when an organisation gets removed? It seems like that should have a side effect on all users who have access to it but I don't seem to have a good way of making that query on the write side.
Try not to think of aggregate as an "entity" in traditional systems. Choose aggregate root as a transactional and consistency boundary.
This means that all commands to the given aggregate are sequental, its state is consistent, meaning you can be sure that your command is applied to expected aggregate state and no chages is being done by another user or process.
As an extreme example, you can even have a single aggregate "System", and the whole system state will be accessible to you. But this mean the size of the state will be enourmous, and each command will lock the whole system.
So choose your aggregate large enough to control its transactions and small enough not to block other transactions.
In your example, I can guess that User is more about identity, login, profile, avatar - things like that. It can live without knowledge of Organisation and access rights to it. Organisation is the aggregate that deals with access rights and changing access rights is a transaction that affect the single organisation.
So I would send grantAccess command to the Organisation, not to the user in your example. But of course it depends on other requirements, and I could be wrong here.
Also, there always will be some inter-aggregate business rules that can be implemented with saga. An example would be a rule that if User login is disabled, its access rights are removed after 30 days. Saga is a long-running business transaction that can affect several aggregates.

DDD repository input parameters

Which is suggested way to implement _customRoleRepository in the following examples?
Following code is executed in the application service.
var user = _userRepository.GetById(1);
var customRole = _customRoleRepository.GetById(user.CustomRoleId);
or
var user = _userRepository.GetById(1);
var customRole = _customRoleRepository.GetForUser(user);
Given the two options I would probably go for the first one, which keeps consistency of accessing by an ID.
If possible, it might be preferable to load the custom role when you load the user to avoid another round trip to the database, especially if this is a common operation. This could be implemented as a read model.
This is presuming you have modelled your aggregates correctly... :)
Hate to say it but in a DDD environment, my answer would be neither.
In your first example, the role repository can be ignorant of the user domain which is good but it means the application needs to know that to get the role it needs to pull an id out of the user and then query another repository. In other words, the application is acting as a mapper between user and role.
In the second example, the roles repository now needs to know about the user domain. Not great but on the other hand the application no longer needs to know about roleId. So that is good. Classic sort of trade off between the two approaches.
But in both cases the application still needs two repositories to get it's information. What happens when more relations are needed? The number of repositories can quickly grow and things become a mess.
In Domain Driven Design you should try to think in terms of aggregate roots(AR) and domain contexts. For your example context, the user is an AR and the role becomes a child. So you might have:
var user = _userFinder.GetById(1);
var customRole = user.CustomRole;
That hides most of the implementation details from you application and allows you to focus on what your domain entities actually need to do.
Both are equally valid, depending on your needs. GetForUser would be good if you want to ensure the calling code has a valid User aggregate before you try and retrieve the roles - while it does couple the customRoleRepository to knowledge of the User aggregate, if you want to require the calling code to have a valid User aggregate, then that coupling has a purpose.
GetByUserId is more consistent with GetById and has less coupling, so if in your context it doesn't matter to call GetByUserId even if the client doesn't have a valid User aggregate, then that's fine too.
If you are loading ById, I've also found using typed identity valueobjects can be quite helpful in providing an extra level of type safety - some conversation about the pros and cons here https://stackoverflow.com/a/5377460/6720449 and here https://groups.google.com/forum/#!topic/dddcqrs/WQ9zRtW3Gbg

Guidelines to decide when a domain role needs to be explicitly modelled

I looking for some guidelines as to when one must explicitly model a role in the domain model.
I will explain my current stance with the help of an example here.
Say we are building a health care system, and the business requirement states
"That only doctors with 3+ years of experience and certain
qualifications can perform surgeries"
In this case it is evident that the act of doing a surgery can only be performed by a person playing the role of a doctor and the doctor needs to meet certain prerequisites to perform the action
docter.performSurgery()
So basically all doctors are not the same
This method will probably check if the preconditions are met
So in the above cases, I will model the role explicitly.
Now lets consider the alternate scenario.
Only a admin can approve of a funds transfer
In the above case I do not find any need to model this role in domain, as their are no rules distinguishing one admin from another in my domain.
Any person/userlogin with the permission of admin can perform this action, I would rather design this into my security infrastructure and ensure that the
approveTransfer() method invoked on the application layer is invoked only if the currently logged in user has the ADMIN permission.
So the "domain model" by which i mean classes like the Account class is unaware of this rule, this is codified in the application layer either via AOP or probably the AccountService class or the like.
What do the wise men have to say about this ? :)
When designing aggregates I always ask myself an important question.
What is the consistency boundary for the process I'm attempting to model?
I ask what rules must be applied during any one atomic operation. This is referred to as a transactional boundary, and its your bread butter when defining your invariants (rules that must always be true during the lifetime of an atomic operation - start to end).
As I see it, the rule that a doctor/surgeon must have n years of experience - for a particular operation - is an invariant that must always be transactionally consistent (such as when performing a surgery). Therefor it should be modeled as a transactional boundary within a single aggregate.
Because aggregates can only guarantee the consistency within themselves, the invariants it is responsible for should not be leaked outside of it. In my opinion, and assuming that doctor is an aggregate, a separate roles model should not be responsible for an invariant that the doctor's model itself should be responsible for.
Aggregate to aggregate relationships should really only be established to provide 'help' in giving some missing piece of information. But the rules and how that information is interpreted should be isolated within its respective aggregate.
A separate situation can arise for user authentication. You may have a bounded context with a Customer model, but the details of permissions, authentication, and roles are so vast as to require an entirely separate system to deal with. In this case you may end up creating a separate bounded context for User Roles and Permission and linking the two bounded contexts. In this scenario you could have a domain service that deals with the communication between the two. Call the Customer root with an operation and pass in the domain service for some intention revealing double dispatch and let the domain service resolve wheather that operation goes through or not. In this scenario though, the responsibilities of user auth is not the Customer's at all. The Customer simply doesn't care (because it cannot itself guarantee the transaction) and its up to User Auth and Roles to decide what to do.
From: Implementing Domain Driven Design - Vaughn Vernon
A properly designed Aggregate is one that can be modified in any way required by the business with its invariants completely consistent within a single transaction. And a properly designed Bounded Context modifies only one Aggregate instance per transaction in all cases. What is more, we cannot correctly reason on Aggregate design without applying transactional analysis.

DDD - can a repository fetch an aggregate by something other than its identifier?

I model a User as an aggregate root and a User is composed of an Identifier value object as well as an Email value object. Both value objects can uniquely identify a User, however the email is allowed to change and the identifier cannot.
In most examples of DDD I have seen, a repository for an aggregate root only fetches by identifier. Would it be correct to add another method that fetches by email to the repository? Am I modeling this poorly?
I would say yes, it is appropriate for a repository to have methods for retrieving aggregates by something other than the identity. However, there are some subtleties to be aware of.
The reason that many repository examples only retrieve by ID is based on the observation that repositories coupled with the structure of aggregates cannot fulfill all query requirements. For instance, if you have a query which calls for some fields from an aggregate as well as some fields for a referenced aggregate and some summary data, the corresponding aggregate classes cannot be used to represent this data. Instead, a dedicated read-model is needed. Therefore, querying responsibilities are decoupled from the repository. This have several advantages (queries can be served by a dedicated de-normalized store) and it is the principal paradigm of CQRS. In this type of architecture, domain classes are only retrieved by the repository when some behavior needs to execute. All read-only use cases are served by a read-models.
The reason that I think it appropriate for a repository to have a GetByEmail method is based on YAGNI and battling complexity. You an allow your application to evolve as requirements change and grow. You don't need to jump to CQRS and separate read/write stores right away. You can start with a repository that also happens to have a query method. The only thing to keep in mind is that you should try to retrieve entities by ID when you need to invoke some behavior on those entities.
I would put this functionality into a service / business layer that is specific to your User object. Not every object is going to have an Email identifier. This seems more like business logic than the responsibility of the repository. I am sure you already know this, but here is good explanation of what I am talking about.
I would not recommend this, but you could have a specific implementation of your repository for a User that exposes a GetByEmail(string emailAddress) method, but I still like the service idea.
I agree with what eulerfx has answered:
You need to ask yourself why you need to get the AR using something
other than the ID.
I think it would be rather obvious that you do not have the ID but you do have some other unique identifier such as the e-mail address.
If you go with CQRS you need to first determine whether the data is important to the domain or only to the query store. If you require the data to be 100% consistent then it changes things slightly. You would, for instance, need 100% consistency if you are checking whether an e-mail address exists in order to satisfy the unique constraint. If the queried data is at any time stale you will probably run into problems.
Remember that a repository represents a collection of sorts. So if you do not need to actually operate on the AR (command side) but you have decided that where you are using your domain is appropriate then you could always go for a ContainsEMailAddress on the repository; else you could have a query side for your domain data store also since your domain data store (OLTP type store) is 100% consistent whereas your query store (OLAP type store) may only be eventually consistent, as is typical of CQRS with a separate query store.
In most examples of DDD I have seen, a repository for an aggregate
root only fetches by identifier.
I'd be curious to know what examples you've looked at. According to the DDD definition, a Repository is
A mechanism for encapsulating storage, retrieval, and search behavior
which emulates a collection of objects.
Search obviously includes getting a root or a collection of roots by all sorts of criteria, not only their ID's.
Repository is a perfect place for GetCustomerByEmail(), GetCustomersOver18(), GetCustomersByCountry(...) and so on.
Would it be correct to add another method that fetches by email to the repository? - I would not do that. In my opinion a repository should have only methods for getting by id, save and delete.
I'd rather ask why you don't have user id in the command handler in which you want to fetch the user and call a domain method on it. I don't know what exactly you are doing, but for the login/register scenario, I would do following. When a user logs in, he passes an email address and a password, and you do a query to authenticate the user - this would not use domain or repository (that is just for commands), but would use some query implementation which would return some UserDto which would contain user id, from this point you have the user id. Next scenario is registration. The command handler to create a new user would create a new user entity, then the user needs to log in.

Authorization System Design Question

I'm trying to come up with a good way to do authentication and authorization. Here is what I have. Comments are welcome and what I am hoping for.
I have php on a mac server.
I have Microsoft AD for user accounts.
I am using LDAP to query the AD when the user logs in to the Intranet.
My design question concerns what to do with that AD information. It was suggested by a co-worker to use a naming convention in the AD to avoid an intermediary database. For example, I have a webpage, personnel_payroll.php. I get the url and with the URL and the AD user query the AD for the group personnel_payroll. If the logged in user is in that group they are authorized to view the page. I would have to have a group for every page or at least user domain users for the generic authentication.
It gets more tricky with controls on a page. For example, say there is a button on a page or a grid, only managers can see this. I would need personnel_payroll_myButton as a group in my AD. If the user is in that group, they get the button. I could have many groups if a page had several different levels of authorizations.
Yes, my AD would be huge, but if I don't do this something else will, whether it is MySQL (or some other db), a text file, the httpd.conf, etc.
I would have a generic php funciton IsAuthorized for the various items that passes the url or control name and the authenticated user.
Is there something inherently wrong with using a naming convention for security like this and using the AD as that repository? I have to keep is somewhere. Why not AD?
Thank you for comments.
EDIT: Do you think this scheme would result in super slow pages because of the LDAP calls?
EDIT: I cannot be the first person to ever think of this. Any thoughts on this are appreciated.
EDIT: Thank you to everyone. I am sorry that I could not give you all more points for answering. I had to choose one.
I wonder if there might be a different way of expressing and storing the permissions that would work more cleanly and efficiently.
Most applications are divided into functional areas or roles, and permissions are assigned based on those [broad] areas, as opposed to per-page permissions. So for example, you might have permissions like:
UseApplication
CreateUser
ResetOtherUserPassword
ViewPayrollData
ModifyPayrollData
Or with roles, you could have:
ApplicationUser
ApplicationAdmin
PayrollAdmin
It is likely that the roles (and possibly the per-functionality permissions) may already map to data stored in Active Directory, such as existing AD Groups/Roles. And if it doesn't, it will still be a lot easier to maintain than per-page permissions. The permissions can be maintained as user groups (a user is either in a group, so has the permission, or isn't), or alternately as a custom attribute:
dn: cn=John Doe,dc=example,dc=com
objectClass: top
objectClass: person
objectClass: webAppUser
cn: John Doe
givenName: John
...
myApplicationPermission: UseApplication
myApplicationPermission: ViewPayrollData
This has the advantage that the schema changes are minimal. If you use groups, AD (and every other LDAP server on the planet) already has that functionality, and if you use a custom attribute like this, only a single attribute (and presumably an objectClass, webAppUser in the above example) would need to be added.
Next, you need to decide how to use the data. One possibility would be to check the user's permissions (find out what groups they are in, or what permissions they have been granted) when they log in and store them on the webserver-side in their session. This has the problem that permissions changes only take effect at user-login time and not immediately. If you don't expect permissions to change very often (or while a user is concurrently using the system) this is probably a reasonable way to go. There are variations of this, such as reloading the user's permissions after a certain amount of time has elapsed.
Another possibility, but with more serious (negative) performance implications is to check permissions as needed. In this case you end up hitting the AD server more frequently, causing increased load (both on the web server and AD server), increased network traffic, and higher latency/request times. But you can be sure that the permissions are always up-to-date.
If you still think that it would be useful to have individual pages and buttons names as part of the permissions check, you could have a global "map" of page/button => permission, and do all of your permissions lookups through that. Something (completely un-tested, and mostly pseudocode):
$permMap = array(
"personnel_payroll" => "ViewPayroll",
"personnel_payroll_myButton" => "EditPayroll",
...
);
function check_permission($elementName) {
$permissionName = $permMap[$elementName];
return isUserInLdapGroup($user,$permissionName);
}
The idea of using AD for permissions isn't flawed unless your AD can't scale. If using a local database would be faster/more reliable/flexible, then use that.
Using the naming convention for finding the correct security roles is pretty fragile, though. You will inevitably run into situations where your natural mapping doesn't correspond to the real mapping. Silly things like you want the URL to be "finbiz", but its already in AD as "business-finance" - do you duplicate the group and keep them synchronized, or do you do the remapping within your application...? Sometimes its as simple as "FinBiz" vs "finbiz".
IMO, its best to avoid that sort of problem to begin with, e.g, use group "185" instead of "finbiz" or "business-finance", or some other key that you have more control over.
Regardless of how your getting your permissions, if end up having to cache it, you'll have to deal with stale cache data.
If you have to use ldap, it would make more sense to create a permissions ou (or whatever the AD equivalent of "schema" is) so that you can map arbitrary entities to those permissions. Cache that data, and you should ok.
Edit:
Part of the question seems to be to avoid an intermediary database - why not make the intermediary the primary? Sync the local permissions database to AD regularly (via a hook or polling), and you can avoid two important issues 1) fragile naming convention, 2) external datasource going down.
You will have very slow pages this way (it sounds to me like you'll be re-querying AD LDAP every time a user navigates to figure out what he can do), unless you implement caching of some kind, but then you may run into volatile permission issues (revoked/added permissions on AD while you didn't know about it).
I'd keep permissions and such separate and not use AD as the repository to manage your application specific authorization. Instead use a separate storage provider which will be much easier to maintain and extend as necessary.
Is there something inherently wrong with using a naming convention for security like
this and using the AD as that repository? I have to keep is somewhere. Why not AD?
Logically, using groups for authorization in LDAP/AD is just what it was designed for. An LDAP query of a specific user should be reasonably fast.
In practice, AD can be very unpredictable about how long data changes take to replicate between servers. If someday your app ends up in a big forest with domain controllers distributed all over the continent, you will really regret putting fine-grained data into there. Seriously, it can take an hour for that stuff to replicate for some customers I've worked with. Mysterious situations arise where things magically start working after servers are rebooted and the like.
It's fine to use a directory for 'myapp-users', 'managers', 'payroll' type groups. Try to avoid repeated and wasteful LDAP lookups.
If you were on Windows, one possibility is to create a little file on the local disk for each authorized item. This gives you 'securable objects'. Your app can then impersonate the user and try to open the file. This leverages MS's big investment over the years on optimizing this stuff. Maybe you can do this on the Mac somehow.
Also, check out Apache's mod_auth_ldap. It is said to support "Complex authorization policies can be implemented by representing the policy with LDAP filters."
I don't know what your app does that it doesn't use some kind of database for stuff. Good for you for not taking the easy way out! Here's where a flat text file with JSON can go a long way.
It seems what you're describing is an Access Control List (ACL) to accompany authentication, since you're going beyond 'groups' to specific actions within that group. To create an ACL without a database separate from your authentication means, I'd suggest taking a look at the Zend Framework for PHP, specifically the ACL module.
In your AD settings, assign users to groups (you mention "managers", you'd likely have "users", "administrators", possibly some department-specific groups, and a generic "public" if a user is not part of a group). The Zend ACL module allows you to define "resources" (correlating to page names in your example), and 'actions' within those resources. These are then saved in the session as an object that can be referred to to determine if a user has access. For example:
<?php
$acl = new Zend_Acl();
$acl->addRole(new Zend_Acl_Role('public'));
$acl->addRole(new Zend_Acl_Role('users'));
$acl->addRole(new Zend_Acl_Role('manager'), 'users');
$acl->add(new Zend_Acl_Resource('about')); // Public 'about us' page
$acl->add(new Zend_Acl_Resource('personnel_payroll')); // Payroll page from original post
$acl->allow('users', null, 'view'); // 'users' can view all pages
$acl->allow('public', 'about', 'view'); // 'public' can only view about page
$acl->allow('managers', 'personnel_payroll', 'edit'); // 'managers' can edit the personnel_payroll page, and can view it since they inherit permissions of the 'users' group
// Query the ACL:
echo $acl->isAllowed('public', 'personnel_payroll', 'view') ? "allowed" : "denied"; // denied
echo $acl->isAllowed('users', 'personnel_payroll', 'view') ? "allowed" : "denied"; // allowed
echo $acl->isAllowed('users', 'personnel_payroll', 'edit') ? "allowed" : "denied"; // denied
echo $acl->isAllowed('managers', 'personnel_payroll', 'edit') ? "allowed" : "denied"; // allowed
?>
The benefit to separating the ACL from the AD would be that as the code changes (and what "actions" are possible within the various areas), granting access to them is in the same location, rather than having to administrate the AD server to make the change. And you're using an existing (stable) framework so you don't have to reinvent the wheel with your own isAuthorized function.

Resources