Aggregate root optimization with nested objects & big collections

Aggregate root optimization with nested objects & big collections - domain-driven-design

Lets assume scenario:
We have Users of the system
Each User have their Clients (Client is always assigned to one and only one User)
Users upload different Documents and a Document is always assigned to one and only one Client
One of the business rules is that User can upload up to X Documents in total, regardless of number of Clients.
By the book, i would make User an aggregate root which would contain collection of Clients. Then each Client would have collection of Documents uploaded for that particular client. When User attempts to upload new Document for given Client, we would load Users aggregate root with all of its Clients and their Documents, and on User class i'd have method like:
boolean CanUploadDocument()
{
int numberOfDocuments = //Iterate Clients and sum up total number of their documents;
//compare to maximum allowed number of docs for User instance
return numberOfDocuments < this.maxAllowedNumberOfDocuments;
}
All well and good, but maxAllowedNumberOfDocuments can be thousands or tens of thousands and it feels like a huge overkill to load them all from db just to count & compare them.
Putting int documentsCount on User seems like breaking the rules and introducing unnecessary redundancy.
Is this the case to introduce separate aggregate root like UserQuota where we would load just count of all Documents and do the check? Or maybe a value object UserDocumentCount which service would get and call method on User object:
boolean CanUploadDocument(UserDocumentCount count)
{
//compare to maximum allowed number of docs for User instance
return count < this.maxAllowedNumberOfDocuments;
}
What is the ddd-proper & optimized way to handle this?

Having a big User aggregate is not a solution but not because of the fact that it is slow and it needs an optimization, it's because of the internal fields cohesion.
In order to protect the quota limit the User aggregate needs only the uploaded documents and nothing more. This is a sign that you have in fact two aggregates, the second being UserDocuments with its method uploadDocument. This method internally checks the quote invariant. As an optimization, you could keep a int countOfDocumentsUploadedSoFar that is used in the uploadDocument method. The two aggregates share only the same identity (the UserId).
Note: no inheritance is needed between the two aggregates.

Introducing something like UserQuota looks like a good solution. This thing is a real domain concept, it has a right to be an entity. Just now it has one propery DocumentsCount, but in time probably you will need LasDocumentUploadedTime... MaxAllowedNumberOfDocuments can be part of the quota too, it will help when this number changed and the change should be applied only for new quotas, or then quotas became more personal.
Your domain operations should touch quotas too. For example, when uploading a document you initially read appropriate quota and check it, store document, then update the quota.

Related

Azure Service Bus: Ordered Processing of Session Sequences

Are there any recommended architectural patterns with Service Bus for ensuring ordered processing of nested groups of messages which are sent out of order? We are using Sessions, but when it comes down to ensuring that a set of Sessions must be processed sequentially in a certain order before moving onto another set of Sessions, the architecture becomes cumbersome very quickly. This question might best be illustrated with an example.
We are using Service Bus to integrate changes in real-time from a database to a third-party API. Every N minutes, we get notified of a new 'batch' of changes from the database which consists of individual records of data across different entities. We then transform/map each record and send it along to an API. For example, a 'batch' of changes might include 5 new/changed 'Person' records, 3 new/changed 'Membership' records, etc.
At the outer-most level, we must always process one entire batch before we can move on to another batch of data, but we also have a requirement to process each type of entity in a certain order. For example, all 'Person' changes must be processed for a given batch before we can move on to any other objects.
There is no guarantee that these records will be queued up in any order which is relevant to how they will need to be processed, particularly within a 'batch' of changes (e.g. the data from different entity types will be interleaved).
We actually do not necessarily need to send the individual records of entity data in any order to the API (e.g. it does not matter in which order I send those 5 Person records for that batch, as long as they are all sent before the 3 Membership records for that batch). However, we do group the messages into Sessions by entity type so that we can guarantee homogeneous records in a given session and target all records for that entity type (this also helps us support a separate requirement we have when calling the API to send a batch of records when possible instead of an individual call per record to avoid API rate limiting issues). Currently, our actual Topic Subscription containing the record data is broken up into Sessions which are unique to the entity type and the batch.
"SessionId": "Batch1234\Person"
We are finding that it is cumbersome to manage the requirement that all changes for a given batch must be processed before we move on to the next batch, because there is no Session which reliably groups those "groups of entities" together (let alone processing those groups of entities themselves in a certain order). There is, of course, no concept of a 'session of sessions', and we are currently handling this by having a separate 'Sync' queue to represent an entire batch of changes which needs to be processed what sessions of data are contained in that batch:
"SessionId": "Batch1234",
"Body":
{
"targets": ["Batch1234\Person", "Batch1234\Membership", ...]
}
This is quite cumbersome, because something (e.g. a Durable Azure Function) now has to orchestrate the entire process by watching the Sync queue and then spinning off separate processors that it oversees to ensure correct ordering at each level (which makes concurrency management and scalability much more complicated to deal with). If this is indeed a good pattern, then I do not mind implementing the extra orchestration architecture to ensure a robust, scalable implementation. However, I cannot help from feeling that I am missing something or not thinking about the architecture the right way.
Is anyone aware of any other recommended pattern(s) in Service Bus for handling ordered processing of groups of data which themselves contain groups of data which must be processed in a certain order?

For the record I'm not a service bus expert, specifically.
The entire batch construct sounds painful - can you do away with it? Often if you have a painful input, you'll have a painful solution - the old "crap in, crap out" maxim. Sometimes it's just hard to find an elegant solution.
Do the 'sets of sessions' need to be processed in a specific order?
Is a 'batch' of changes = a session?
I can't think of a specific pattern, but a "divide and conquer" approach seems reasonable (which is roughly what you have already?):
Watch for new batches, when one occurs hand it off to a BatchProcessor.
BatchProcessor applies all the rules to the batch, as you outlined.
Consider having the BatchProcessor dump it's results on a queue of some kind which is the source for the API - that way you have some kind of isolation between the batch processing and the API.

CQRS Event Sourcing check username is unique or not from EventStore while sending command

EventSourcing works perfectly when we have particular unique EntityID but when I am trying to get information from eventStore other than particular EntityId i am having tough time.
I am using CQRS with EventSourcing. As part of event-sourcing we are storing the events in SQL table as columns(EntityID (uniqueKey),EventType,EventObject(eg. UserAdded)).
So while storing EventObject we are just serializing the DotNet object and storing it in SQL, So, All the details related to UserAdded event will be in xml format. My concern is I want to make sure the userName which is present in db Should be unique.
So, while making command of AddUser I have to query EventStore(sql db) whether the particular userName is already present in eventStore. So for doing that I need to serialize all the UserAdded/UserEdited events in Event store and check if requested username is present in eventStore.
But as part of CQRS commands are not allowed to query may be because of Race condition.
So, I tried before sending the AddUser command just query the eventStore and get all the UserNames by serializing all events(UserAdded) and fetch usernames and if requested username is unique then shoot command else throwing exception that userName already exist.
As with above approach ,we need to query entire db and we may have hundreds of thousands of events/day.So the execution of query/deserialization will take much time which will lead to performance issue.
I am looking for any better approach/suggestion for maintaining username Unique either by getting all userNames from eventStore or any other approach

So, your client (the thing that issues the commands) should have full faith that the command it sends will be executed, and it must do this by ensuring that, before it sends the RegisterUserCommand, that no other user is registered with that email address. In other words, your client must perform the validation, not your domain or even the application services that surround the domain.
From http://cqrs.nu/Faq
This is a commonly occurring question since we're explicitly not
performing cross-aggregate operations on the write side. We do,
however, have a number of options:
Create a read-side of already allocated user names. Make the client
query the read-side interactively as the user types in a name.
Create a reactive saga to flag down and inactivate accounts that were
nevertheless created with a duplicate user name. (Whether by extreme
coincidence or maliciously or because of a faulty client.)
If eventual consistency is not fast enough for you, consider adding a
table on the write side, a small local read-side as it were, of
already allocated names. Make the aggregate transaction include
inserting into that table.

Querying different aggregates with a repository in a write operation as part of your business logic is not forbidden. You can do that in order to accept the command or reject it due to duplicate user by using some domain service (a cross-aggregate operation). Greg Young mentions this here: https://www.youtube.com/watch?v=LDW0QWie21s&t=24m55s
In normal scenarios you would just need to query all the UserCreated + UserEdited events.
If you expect to have thousands of these events per day, maybe your events are bloated and you should design more atomically. For example, instead having a UserEdited event raised every time something happens on a user, consider having UserPersonalDetailsEdited and UserAccessInfoEdited or similar, where the fields that must be unique are treated differently from the rest of user fields. That way, querying all the UserCreated + UserAccessInfoEdited prior to accepting or not a command would be a lighter operation.
Personally I'd go with the following approach:
More atomicity in events so that everything that touches fields that should be globally unique is described more explicitly (e.g: UserCreated, UserAccessInfoEdited)
Have projections available in the write side in order to query them during a write operation. So for example I'd subscribe to all UserCreated and UserAccessInfoEdited events in order to keep a queryable "table" with all the unique fields (e.g: email).
When a CreateUser command arrives to the domain, a domain service would query this email table and accept or reject the command.
This solution relies a bit on eventual consistency and there's a possibility where the query tells us that field has not been used and allows the command to succeed raising an event UserCreated when actually the projection hadn't been updated yet from a previous transaction, causing therefore the situation where there are 2 fields in the system that are not globally unique.
If you want to completely avoid these uncertain situations because your business can't really deal with eventual consistency my recommendation is to deal with this in your domain by explicitly modeling them as part of your ubiquitous language. For example you could model your aggregates differently since it's obvious that your aggregate User is not really your transactional boundary (i.e: it depends on others).

As often, there's no right answer, only answers that fit your domain.
Are you in an environment that really requires immediate consistency ? What would be the odds of an identical user name being created between the moment uniqueness is checked by querying (say, at client side) and when the command is processed ? Would your domain experts tolerate, for instance, one out of 1 million user name conflict (that can be compensated afterwards) ? Will you have a million users in the first place ?
Even if immediate consistency is required, "user names should be unique"... in which scope ? A Company ? An OnlineStore ? A GameServerInstance ? Can you find the most restricted scope in which the uniqueness constraint must hold and make that scope the Aggregate Root from which to sprout a new user ? Why would the "replay all the UserAdded/UserEdited events" solution be bad after all, if the Aggregate Root makes these events small and simple ?

With GetEventStore (from Greg Young) you can use whatever string as your aggregateId/StreamId. Use the username as the id of the aggregate instead of guids, or a combination like "mycompany.users.john" as the key and.. voila! You have for free user name uniqueness!

How to account for a failed write or add process in Mongodb

So I've been trying to wrap my head around this one for weeks, but I just can't seem to figure it out. So MongoDB isn't equipped to deal with rollbacks as we typically understand them (i.e. when a client adds information to the database, like a username for example, but quits in the middle of the registration process. Now the DB is left with some "hanging" information that isn't assocaited with anything. How can MongoDb handle that? Or if no one can answer that question, maybe they can point me to a source/example that can? Thanks.

MongoDB does not support transactions, you can't perform atomic multistatement transactions to ensure consistency. You can only perform an atomic operation on a single collection at a time. When dealing with NoSQL databases you need to validate your data as much as you can, they seldom complain about something. There are some workarounds or patterns to achieve SQL like transactions. For example, in your case, you can store user's information in a temporary collection, check data validity, and store it to user's collection afterwards.
This should be straight forwards, but things get more complicated when we deal with multiple documents. In this case, you need create a designated collection for transactions. For instance,
transaction collection
{
id: ..,
state : "new_transaction",
value1 : values From document_1 before updating document_1,
value2 : values From document_2 before updating document_2
}
// update document 1
// update document 2
Ooohh!! something went wrong while updating document 1 or 2? No worries, we can still restore the old values from the transaction collection.
This pattern is known as compensation to mimic the transactional behavior of SQL.

Cassandra design pattern for shared record (m:n)

we have two entities User and Role. One User can have multiple Roles, and single Role can be shared by many users -
typical m:n relation.
Roles are also dynamic and we expect large amount (millions).
It is quiet simple to model such data in relational DB. I would like to find out whenever it would be possible in cassandra.
Currently I see two solutions:
A) Use normalized model and create something similar to inner-join
Create each single role in separate CF and store in User record foreign keys to referenced roles.
pro: Roles are not replicated and maintenance is simple
contra: In order to get all Roles for single User multiple network calls are necessary. User record contains only FK, Roles are stored
using random partitioner, in this case each role could be stored on different cassandra node.
B) Denormalize model and replicate roles to avoid round trips
In this scenario User record in cassandra contains all user roles as copy.
pro: It is possible to read User with all roles within single query. This guarantees short load times.
contra: Each shared Role is copied multiple times - on each related User. Maintaining roles is very difficult, especially if we have
large data amount. For example: one Role is shared by 1000 users. Changes on this Role require update on 1000 User records.
For very large data sets such updates has to be executed as asynchronous job.
Solutions above are very limited, meybie Cassandra is not right solution for m:n relations ? Do you know any cassandra design patter for such problem?
Thanks,
Maciej

The way you want to design a data store in Cassandra is to start with the queries you plan to execute and make it so you can get all the information you need at once. Denormalization is the name of the game here; if you're not replicating that role information in each user node, you're not going to avoid disk seeks, and your read performance will suffer. Joins do not make sense; if you want a relational database, use a relational database.
At a guess, you're going to ask a lot of questions about what roles a user has and what they should be doing with them, so you definitely want to have role information duplicated in each user entry - probably with each role getting its own column (role-ROLE_KEY => serialized-capability-info instead of roles => [serialized array of capability info]). Your application will need some way to iterate over all those columns itself.
You will probably want to look at what users are in a role, and so you should probably store all the user information you'll need for that view in the role column family as well (though a subset of the full user record will do).
When you run updates, and add/remove users from roles, you will need to make sure that you update both the role's list of users and the user's roles at the same time. Because you're using a column for each relation, instead of a single shared serialized blob, this should work even if you're editing two different roles that share the same user at the same time: Cassandra can merge the updates, including the deletes.
If the query needs to be asynchronous, then go make your application handle it. Remember that Cassandra is an eventual-consistency data store and you shouldn't expect updates to be visible everywhere immediately anyway.

Another option these days is to use playORM that can do joins for you ;). You just decide how to partition your data. It uses Scalabla JQL which is a simple addition on JQL as follows
#NoSqlQuery(name="findJoinOnNullPartition", query="PARTITIONS t('account', :partId) select t FROM Trade as t INNER JOIN t.security as s where s.securityType = :type and t.numShares = :shares")
So, we can finally normalize our data on a noSQL system AND scale at the same time. We don't need to give up normalization which has certain benefits.
Dean

Rankings in Azure Table

I am just stuck in a design problem. I want to assign ranks to user records in a table. They do some action on the site and given a rank on basis of leader board. And the select I want on them could be on Top 10, User's position, Top 10 logged in today etc.
I just can not find a way to store it in Azure table. Than I thought about storing custom collection object (a sorted list) in blob.
Any suggestions?

Table entities are sorted by PartitionKey, RowKey. While you could continually delete and recreate users (thus allowing you to change the PK, RK) to give the correct order, it seems like a bad idea or at least overkill. Instead, I would probably store the data that you use the compute the rankings and periodically compute and store the rankings (as you say). We do this a lot in our work - pre-compute what the data should look like in JSON view, store it in a blob, and let the UI query it directly. The trick is to decide when to re-compute the view. After a user does an item that would cause the rankings to be re-computed, I would probably queue a message and let a worker process go and re-compute the view. This prevents too many workers from trying to update the data at once.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string