One aggregate has to reference other aggregate by ids, for instance: order stores userId. So if I need the user entity to do something in the order aggregate I should pass it like this: order.doSomthing(user). But where should I retrieve the user in the application service or domain service?
You don't.
An Aggregate operates/depends only on the data that it owns. This applies to write but also to read.
If an Aggregate, for example User needs some data from another Aggregate, for example Order, then the Application service (or most probable a Saga/Process manager) gets the data from the Order and it passes it to the User:
user.doSomething(order.some, order.info)
Related
What data should be contained in the event? Only data that is specific to this event or some data from boundary context too.
For example. I have account with domain and name properties
account(id, name, domain)
When I change account name NameChanged(id, name) event is created. But when this event is used for read side projection (cassandra db) I need to fill two tables (example does not use a materialized view):
accounts(id, name, domain) (primary key only `id`)
accountsByDomain(domain, id, name) (primary key contains `domain` and `id`)
Second table can not be synced by name, because where is no domain in the event.
Question: must be the event as simple as possible (and calls the state of the entity to get information that might have been different at the time the event occurred) or must it have complete information for read side projection
We aren't usually limited to processing an event in isolation - the identifiers in the event are available to allow us to look up the other information we need (which could, for example, be included in other events in the same stream).
Reviewing Greg Young's talk on Polyglot Data may help clarify this idea.
Lets assume scenario:
We have Users of the system
Each User have their Clients (Client is always assigned to one and only one User)
Users upload different Documents and a Document is always assigned to one and only one Client
One of the business rules is that User can upload up to X Documents in total, regardless of number of Clients.
By the book, i would make User an aggregate root which would contain collection of Clients. Then each Client would have collection of Documents uploaded for that particular client. When User attempts to upload new Document for given Client, we would load Users aggregate root with all of its Clients and their Documents, and on User class i'd have method like:
boolean CanUploadDocument()
{
int numberOfDocuments = //Iterate Clients and sum up total number of their documents;
//compare to maximum allowed number of docs for User instance
return numberOfDocuments < this.maxAllowedNumberOfDocuments;
}
All well and good, but maxAllowedNumberOfDocuments can be thousands or tens of thousands and it feels like a huge overkill to load them all from db just to count & compare them.
Putting int documentsCount on User seems like breaking the rules and introducing unnecessary redundancy.
Is this the case to introduce separate aggregate root like UserQuota where we would load just count of all Documents and do the check? Or maybe a value object UserDocumentCount which service would get and call method on User object:
boolean CanUploadDocument(UserDocumentCount count)
{
//compare to maximum allowed number of docs for User instance
return count < this.maxAllowedNumberOfDocuments;
}
What is the ddd-proper & optimized way to handle this?
Having a big User aggregate is not a solution but not because of the fact that it is slow and it needs an optimization, it's because of the internal fields cohesion.
In order to protect the quota limit the User aggregate needs only the uploaded documents and nothing more. This is a sign that you have in fact two aggregates, the second being UserDocuments with its method uploadDocument. This method internally checks the quote invariant. As an optimization, you could keep a int countOfDocumentsUploadedSoFar that is used in the uploadDocument method. The two aggregates share only the same identity (the UserId).
Note: no inheritance is needed between the two aggregates.
Introducing something like UserQuota looks like a good solution. This thing is a real domain concept, it has a right to be an entity. Just now it has one propery DocumentsCount, but in time probably you will need LasDocumentUploadedTime... MaxAllowedNumberOfDocuments can be part of the quota too, it will help when this number changed and the change should be applied only for new quotas, or then quotas became more personal.
Your domain operations should touch quotas too. For example, when uploading a document you initially read appropriate quota and check it, store document, then update the quota.
I'm new to CQRS and need advice on the following situation in my design. A command updates state of an aggregate A; the read model needs to be consequently updated with a result of a cross-aggregate calculation method; this method belongs to another aggregate B which holds a reference to the aggregate A; the method is a function of states of both aggregate B and the referenced aggregate A. Where is the correct place for this function to be called?
My considerations (can be skipped):
Command handler updating state of aggregate A could technically fetch aggregate B from the repository, call calculation on it and put result in the domain event; however I believe it's not command handler's job to fetch aggregates other than one being modified, even for reading purposes; also it's not command handler's job to perform calculations just to send with events rather than modify the state of domain.
The domain event ('Aggregate A updated') raised by the aggregate A contains only updated state of aggregate A, there's not enough info on state of aggregate B. Read model's event handler has no access to domain model, so it can neither fetch aggregate B nor call the desired function on aggregate B to update read model.
I know that any state needed by command which is external to the aggregate being modified must be passed along with the command. This way the application service, before sending the command, could fetch state of aggregate B (from read model), and put it in the command. For that I would have to move the function from aggregate B to some service and pass there states of both A and B. That would make aggregate B more anemic. Plus the above mentioned problem with doing calculations within command handler.
I've read people suggesting that any calculations that only read model is interested in belong to the read model itself. So the read model's handler of my event would just have at its disposal all needed state and behavior to perform calculations. However that would mean I have to duplicate much of the domain model concepts at the query side; it would be too complex to have a full-blown read model.
I've just thought of the following solution: within the domain, create a handler of the domain event 'Aggregate A updated'. It would fetch aggregate B, call the calculation method on it, then raise an 'Aggregate B function result changed' event with the new calculation result in it. Then the read model is able to take the result from this event and update itself. Would this be ok?
Note just in case that I'm not using Event Sourcing.
Any thoughts on this situation would be much appreciated. Thanks!
UPDATE: making the situation more concrete
My aggregates are Workers (Aggregate B) and Groups of workers (Aggregate B). Workers and Groups are a many-to-many relationship. Imagine both a Group and a Worker have some Value property. Worker's calculateValue() is a function of the Worker's Value plus Values of all Groups the Worker participates in. The Command described above is modifying Value for some Group. As a result, all Workers participating in the group would return different result of calculateValue().
What do I want from the read model? I want a list of Workers with calculated Values (that already account for Values from the Worker's all groups). I don't even need Group at the read side. If I go the 'do calculation on the read side' way, I need Groups as well as the whole structure of relationships there. I'm afraid it would be an unjustified complication.
Command handler updating state of aggregate A could technically fetch aggregate B from the repository, call calculation on it and put result in the domain event; however I believe it's not command handler's job to fetch aggregates other than one being modified, even for reading purposes; also it's not command handler's job to perform calculations just to send with events rather than modify the state of domain.
This is not OK because events should represent facts that happened in regard to a single Aggregate.
I know that any state needed by command which is external to the aggregate being modified must be passed along with the command. This way the application service, before sending the command, could fetch state of aggregate B (from read model), and put it in the command. For that I would have to move the function from aggregate B to some service and pass there states of both A and B. That would make aggregate B more anemic. Plus the above mentioned problem with doing calculations within command handler.
You should not send the Aggregate state in an event. In fact you should not query the Aggregate or use it't internal and private state in any other way but by the Aggregate itself. In CQRS the Aggregate is not to be queried. That's a read-model's purpose.
I've read people suggesting that any calculations that only read model is interested in belong to the read model itself. So the read model's handler of my event would just have at its disposal all needed state and behavior to perform calculations. However that would mean I have to duplicate much of the domain model concepts at the query side; it would be too complex to have a full-blown read model.
This is the way to go. However, what do you duplicate anyway? Is the result of that calculation used by the Aggregate to accept or reject any of its commands?
If yes, then it should be done inside the Aggregate, at command execution time and possible the final result sent along the event but only if the calculation can be done with the data from the command and/or the internal Aggregate's state and not by cross Aggregates state. If an Aggregate needs data from other Aggregates then that is a sign that your Aggregates boundaries might be wrong.
If not then the calculation should not stay inside the Aggregate, but only in the read-model.
In CQRS, by splitting the Write from the Read model you would split the calculations also to Write and to Read but there are some cases where a calculation is shared by the two models. In these cases you can extract the calculation inside a Class and use that Class in both the models.
EventSourcing works perfectly when we have particular unique EntityID but when I am trying to get information from eventStore other than particular EntityId i am having tough time.
I am using CQRS with EventSourcing. As part of event-sourcing we are storing the events in SQL table as columns(EntityID (uniqueKey),EventType,EventObject(eg. UserAdded)).
So while storing EventObject we are just serializing the DotNet object and storing it in SQL, So, All the details related to UserAdded event will be in xml format. My concern is I want to make sure the userName which is present in db Should be unique.
So, while making command of AddUser I have to query EventStore(sql db) whether the particular userName is already present in eventStore. So for doing that I need to serialize all the UserAdded/UserEdited events in Event store and check if requested username is present in eventStore.
But as part of CQRS commands are not allowed to query may be because of Race condition.
So, I tried before sending the AddUser command just query the eventStore and get all the UserNames by serializing all events(UserAdded) and fetch usernames and if requested username is unique then shoot command else throwing exception that userName already exist.
As with above approach ,we need to query entire db and we may have hundreds of thousands of events/day.So the execution of query/deserialization will take much time which will lead to performance issue.
I am looking for any better approach/suggestion for maintaining username Unique either by getting all userNames from eventStore or any other approach
So, your client (the thing that issues the commands) should have full faith that the command it sends will be executed, and it must do this by ensuring that, before it sends the RegisterUserCommand, that no other user is registered with that email address. In other words, your client must perform the validation, not your domain or even the application services that surround the domain.
From http://cqrs.nu/Faq
This is a commonly occurring question since we're explicitly not
performing cross-aggregate operations on the write side. We do,
however, have a number of options:
Create a read-side of already allocated user names. Make the client
query the read-side interactively as the user types in a name.
Create a reactive saga to flag down and inactivate accounts that were
nevertheless created with a duplicate user name. (Whether by extreme
coincidence or maliciously or because of a faulty client.)
If eventual consistency is not fast enough for you, consider adding a
table on the write side, a small local read-side as it were, of
already allocated names. Make the aggregate transaction include
inserting into that table.
Querying different aggregates with a repository in a write operation as part of your business logic is not forbidden. You can do that in order to accept the command or reject it due to duplicate user by using some domain service (a cross-aggregate operation). Greg Young mentions this here: https://www.youtube.com/watch?v=LDW0QWie21s&t=24m55s
In normal scenarios you would just need to query all the UserCreated + UserEdited events.
If you expect to have thousands of these events per day, maybe your events are bloated and you should design more atomically. For example, instead having a UserEdited event raised every time something happens on a user, consider having UserPersonalDetailsEdited and UserAccessInfoEdited or similar, where the fields that must be unique are treated differently from the rest of user fields. That way, querying all the UserCreated + UserAccessInfoEdited prior to accepting or not a command would be a lighter operation.
Personally I'd go with the following approach:
More atomicity in events so that everything that touches fields that should be globally unique is described more explicitly (e.g: UserCreated, UserAccessInfoEdited)
Have projections available in the write side in order to query them during a write operation. So for example I'd subscribe to all UserCreated and UserAccessInfoEdited events in order to keep a queryable "table" with all the unique fields (e.g: email).
When a CreateUser command arrives to the domain, a domain service would query this email table and accept or reject the command.
This solution relies a bit on eventual consistency and there's a possibility where the query tells us that field has not been used and allows the command to succeed raising an event UserCreated when actually the projection hadn't been updated yet from a previous transaction, causing therefore the situation where there are 2 fields in the system that are not globally unique.
If you want to completely avoid these uncertain situations because your business can't really deal with eventual consistency my recommendation is to deal with this in your domain by explicitly modeling them as part of your ubiquitous language. For example you could model your aggregates differently since it's obvious that your aggregate User is not really your transactional boundary (i.e: it depends on others).
As often, there's no right answer, only answers that fit your domain.
Are you in an environment that really requires immediate consistency ? What would be the odds of an identical user name being created between the moment uniqueness is checked by querying (say, at client side) and when the command is processed ? Would your domain experts tolerate, for instance, one out of 1 million user name conflict (that can be compensated afterwards) ? Will you have a million users in the first place ?
Even if immediate consistency is required, "user names should be unique"... in which scope ? A Company ? An OnlineStore ? A GameServerInstance ? Can you find the most restricted scope in which the uniqueness constraint must hold and make that scope the Aggregate Root from which to sprout a new user ? Why would the "replay all the UserAdded/UserEdited events" solution be bad after all, if the Aggregate Root makes these events small and simple ?
With GetEventStore (from Greg Young) you can use whatever string as your aggregateId/StreamId. Use the username as the id of the aggregate instead of guids, or a combination like "mycompany.users.john" as the key and.. voila! You have for free user name uniqueness!
we have two entities User and Role. One User can have multiple Roles, and single Role can be shared by many users -
typical m:n relation.
Roles are also dynamic and we expect large amount (millions).
It is quiet simple to model such data in relational DB. I would like to find out whenever it would be possible in cassandra.
Currently I see two solutions:
A) Use normalized model and create something similar to inner-join
Create each single role in separate CF and store in User record foreign keys to referenced roles.
pro: Roles are not replicated and maintenance is simple
contra: In order to get all Roles for single User multiple network calls are necessary. User record contains only FK, Roles are stored
using random partitioner, in this case each role could be stored on different cassandra node.
B) Denormalize model and replicate roles to avoid round trips
In this scenario User record in cassandra contains all user roles as copy.
pro: It is possible to read User with all roles within single query. This guarantees short load times.
contra: Each shared Role is copied multiple times - on each related User. Maintaining roles is very difficult, especially if we have
large data amount. For example: one Role is shared by 1000 users. Changes on this Role require update on 1000 User records.
For very large data sets such updates has to be executed as asynchronous job.
Solutions above are very limited, meybie Cassandra is not right solution for m:n relations ? Do you know any cassandra design patter for such problem?
Thanks,
Maciej
The way you want to design a data store in Cassandra is to start with the queries you plan to execute and make it so you can get all the information you need at once. Denormalization is the name of the game here; if you're not replicating that role information in each user node, you're not going to avoid disk seeks, and your read performance will suffer. Joins do not make sense; if you want a relational database, use a relational database.
At a guess, you're going to ask a lot of questions about what roles a user has and what they should be doing with them, so you definitely want to have role information duplicated in each user entry - probably with each role getting its own column (role-ROLE_KEY => serialized-capability-info instead of roles => [serialized array of capability info]). Your application will need some way to iterate over all those columns itself.
You will probably want to look at what users are in a role, and so you should probably store all the user information you'll need for that view in the role column family as well (though a subset of the full user record will do).
When you run updates, and add/remove users from roles, you will need to make sure that you update both the role's list of users and the user's roles at the same time. Because you're using a column for each relation, instead of a single shared serialized blob, this should work even if you're editing two different roles that share the same user at the same time: Cassandra can merge the updates, including the deletes.
If the query needs to be asynchronous, then go make your application handle it. Remember that Cassandra is an eventual-consistency data store and you shouldn't expect updates to be visible everywhere immediately anyway.
Another option these days is to use playORM that can do joins for you ;). You just decide how to partition your data. It uses Scalabla JQL which is a simple addition on JQL as follows
#NoSqlQuery(name="findJoinOnNullPartition", query="PARTITIONS t('account', :partId) select t FROM Trade as t INNER JOIN t.security as s where s.securityType = :type and t.numShares = :shares")
So, we can finally normalize our data on a noSQL system AND scale at the same time. We don't need to give up normalization which has certain benefits.
Dean