DDD About a design decision - domain-driven-design

DDD About a design decision - domain-driven-design

I have to solve a domain problem and I have some doubts about what is the better solution. I am going to present the problem:
I have Applications and each Application has many Process. An Application has some ProcessSettings too. I have some business rules when I have to create a Process, for example, based on the process settings of application, I have to apply some rules on some process properties.
I have considered Application as aggregate root and Process as other aggregate root, and ProcessSettings as a value object inside Application aggregate.
I have a use case to create processes, and the logic is to create a valid instance of process and persist it with ProcessRepository. Well, I think I have two options to apply the process settings:
In the use case, get the process settings from Application aggregate by ApplicationId through a domain service in Application aggregate, and pass ProcessSettings to process create method.
In the use case, to create the process and through a domain service in Application aggregate pass a copy of process (a value object) to apply the process settings.
What approach do you believe is most correct to use?, or do you implement it in another way?
Thanks in advance!

Our product owner told us that if the client paid for some settings in
a moment and created a process that settings will be valid for that
process if the client does not update it. If the client leave to paid
some settings then, when the client want to update that process our
system will not allow update it because the actual settings will not
be fit to the process data
That makes the implementation much easier, given that process settings-based validation only has to occur in process creation/update scenarios. Furthermore, I would guess that race conditions would also be irrelevant to the business, such as if settings are changed at the same time a process gets created/updated.
In light of this, we can assume that ProcessSettings and Process can be in distinct consistency boundaries. In other words, both can be part of separate aggregate roots.
Furthermore, it's important to recognize that the settings-based validation are not Process invariants, meaning the Process shouldn't be responsible for enforcing these rules itself. Since these aren't invariants you also shouldn't strive for an always-valid strategy and use a deferred validation strategy instead.
From that point there are many good ways of modeling this use case, which will all boil down to something like:
//Application layer service
void createProcess(processId, applicationId, data) {
application = applicationRepository.applicationOfId(applicationId);
process = application.createProcess(processId, data);
processRepository.add(process);
}
//Application AR
Process createProcess(processId, data) {
process = new Process(processId, this.id, data);
this.processSettings.ensureRespectedBy(process);
return process;
}
If ProcessSettings are part of the Application AR then it could make sense to put a factory method on Application for creating processes given it holds the necessary state to perform the validation, like in the above example. That removes the need from introducing a dedicated domain service for the task, such as a stand-alone factory.
If ProcessSettings can be it's own aggregate root you could always do the same, but introduce a lookup domain service for settings:
//Application AR
Process createProcess(processId, data, settingsLookupService) {
process = new Process(processId, this.id, data);
processSettings = settingsLookupService.findByApplicationId(this.id);
processSettings.ensureRespectedBy(process);
return process;
}
Some might say your aggregate is not pure anymore however, given it's performing indirect IO through calling the settingsLookupService. If you want to avoid such dependency then you may introduce a domain service such as ProcessDomainService to encapsulate the creation/update logic or you may even consider the lookup logic is not complex enough and put it directly in the application layer.
//Application layer service
void createProcess(processId, applicationId, data) {
processSettings = processRepository.findByApplicationId(applicationId);
process = application.createProcess(processId, data, processSettings);
processRepository.add(process);
}
There's no way for us to tell which approach is better in your specific scenario and sometimes there isin't even a perfect way and many various ways could be equally good. By experience it's a good idea to keep aggregates pure though as it's easier for unit tests (less mocking).

Related

DDD: Can aggregates get other aggregates as parameters?

Assume that I have two aggregates: Vehicles and Drivers, And I have a rule that a vehicle cannot be assigned to a driver if the driver is on vacation.
So, my implementation is:
class Vehicle {
public void assignDriver(driver Driver) {
if (driver.isInVacation()){
throw new Exception();
}
// ....
}
}
Is it ok to pass an aggregate to another one as a parameter? Am I doing anything wrong here?

I'd say your design is perfectly valid and reflects the Ubiquitous Language very well. There's several examples in the Implementing Domain-Driven Design book where an AR is passed as an argument to another AR.
e.g.
Forum#moderatePost: Post is not only provided to Forum, but modified by it.
Group#addUser: User provided, but translated to GroupMember.
If you really want to decouple you could also do something like vehicule.assignDriver(driver.id(), driver.isInVacation()) or introduce some kind of intermediary VO that holds only the necessary state from Driver to make an assignation decision.
However, note that any decision made using external data is considered stale. For instance, what happens if the driver goes in vacation right after it's been assigned to a vehicule?
In such cases you may want to use exception reports (e.g. list all vehicules with an unavailable driver), flag vehicules for a driver re-assignation, etc. Eventual consistency could be done either through batch processing or messaging (event processing).
You could also seek to make the rule strongly-consistent by inverting the relationship, where Driver keeps a set of vehiculeId it drives. Then you could use a DB unique constraint to ensure the same vehicule doesn't have more than 1 driver assigned. You could also violate the rule of modifying only 1 AR per transaction and model the 2-way relationship to protect both invariants in the model.
However, I'd advise you to think of the real world scenario here. I doubt you can prevent a driver from going away. The system must reflect the real world which is probably the book of record for that scenario, meaning the best you can do with strong consistency is probably unassign a driver from all it's vehicules while he's away. In that case, is it really important that vehicules gets unassigned immediately in the same TX or a delay could be acceptable?

In general, an aggregate should keep its own boundaries (to avoid data-load issues and transaction-scoping issues, check this page for example), and therefore only reference another aggregate by identity, e.g. assignDriver(id guid).
That means you would have to query the driver prior to invoking assignDriver, in order to perform validation check:
class MyAppService {
public void execute() {
// Get driver...
if (driver.isInVacation()){
throw new Exception();
}
// Get vehicle...
vehicle.assignDriver(driver.id);
}
}

Suppose you're in a micro-services architecture,
you have a 'Driver Management' service, and an 'Assignation Service' and you're not sharing code between both apart from technical libraries.
You'll naturally have 2 classes for 'Driver',
An aggregate in 'Driver Management' which will hold the operations to manage the state of a driver.
And a value object in the 'Assignation Service' which will only contain the relevant information for assignation.
This separation is harder to see/achieve when you're in a monolithic codebase
I also agree with #plalx, there's more to it for the enforcement of the rule, not only a check on creation, for which you could implement on of the solutions he suggested.
I encourage you to think in events, what happens when:
a driver has scheduled vacation
when he's back from vacation
if he changes he vacation dates
Did you explore creating an Aggregate for Assignation?

What is the best way to rehydrate aggregate roots and their associated entities in an event sourced environment

I have seen information on rehydrating aggregate roots in SO, but I am posting this question because I did not find any information in SO about doing so with in the context of an event sourced framework.
Has a best practice been discovered or developed for how to rehydrate aggregate roots when operating on the command side of an application using the event sourcing and CQRS pattern
OR is this still more of a “preference“ among architects?
I have read through a number of blogs and watched a number of conference presentations on you tube and I seem to get different guidance depending on who I am attending to.
On the one hand, I have found information stating fairly clearly that developers should create aggregates to hydrate themselves using “apply“ methods on events obtained directly from the event store..
On the other hand, I have also seen in several places where presenters and bloggers have recommended rehydrating aggregate roots by submitting a query to the read side of the application. Some have suggested creating specific validation “buckets“ / projections on the read side to facilitate this.
Can anyone help point me in the right direction on discovering if there is a single best practice or if the answer primarily depends upon performance issues or some other issue I am not thinking about?

Hydrating Aggregates in an event sourced framework is a well-understood problem.
On the one hand, I have found information stating fairly clearly that
developers should create aggregates to hydrate themselves using
“apply“ methods on events obtained directly from the event store..
This is the prescribed way of handling it. There are various ways of achieving this, but I would suggest keeping any persistence logic (reading or writing events) outside of your Aggregate. One simple way is to expose a constructor that accepts domain events and then applies those events.
On the other hand, I have also seen in several places where presenters
and bloggers have recommended rehydrating aggregate roots by
submitting a query to the read side of the application. Some have
suggested creating specific validation “buckets“ / projections on the
read side to facilitate this.
You can use the concept of snapshots as a way of optimizing your reads. This will create a memoized version of your hydrated Aggregate. You can load this snapshot and then only apply events that were generated since the snapshot was created. In this case, your Aggregate can define a constructor that takes two parameters: an existing state (snapshot) and any remaining domain events that can then be applied to that snapshot.
Snapshots are just an optimization and should be considered as such. You can create a system that does not use snapshots and apply them once read performance becomes a bottleneck.
On the other hand, I have also seen in several places where presenters
and bloggers have recommended rehydrating aggregate roots by
submitting a query to the read side of the application
Snapshots are not really part of the read side of the application. Data on the read side exists to satisfy use cases within the application. Those can change based on requirements even if the underlying domain does not change. As such, you shouldn't use read side data in your domain at all.

Event sourcing has developed different styles over the years. I could divide all o those into two big categories:
an event stream represents one entity (an aggregate in case of DDD)
one (partitioned) event stream for a (sub)system
When you deal with one stream per (sub)system, you aren't able to rehydrate the write-side on the fly, it is physically impossible due to the number of events in that stream. Therefore, you would rely on the projected read-side to retrieve the current entity state. As a consequence, this read-side must be fully consistent.
When going with the DDD-flavoured event sourcing, there's a strong consensus in the community how it should be done. The state of the aggregate (not just the root, but the whole aggregate) is restored by the command side before calling the domain model. You always restore using events. When snapshotting is enabled, snapshots are also stored as events in the aggregate snapshot stream, so you read the last one and all events from the snapshot version.
Concerning the Apply thing. You need to clearly separate the function that adds new events to the changes list (what you're going to save) and functions what mutate the aggregate state when events are applied.
The first function is the one called Apply and the second one is often called When. So you call the Apply function in your aggregate code to build up the changelist. The When function is called when restoring the aggregate state from events when you read the stream, and also from the Apply function.
You can find a simplistic example of an event-sourced aggregate in my book repo: https://github.com/alexeyzimarev/ddd-book/blob/master/chapter13/src/Marketplace.Ads.Domain/ClassifiedAds/ClassifiedAd.cs
For example:
public void Publish(UserId userId)
=> Apply(
new V1.ClassifiedAdPublished
{
Id = Id,
ApprovedBy = userId,
OwnerId = OwnerId,
PublishedAt = DateTimeOffset.Now
}
);
And for the When:
protected override void When(object #event)
{
switch (#event)
{
// more code here
case V1.ClassifiedAdPublished e:
ApprovedBy = UserId.FromGuid(e.ApprovedBy);
State = ClassifiedAdState.Active;
break;
// and more here
}
}

DDD - How to modify several AR (from different bounded contexts) throughout single request?

I would want expose a little scenario which is still at paper state, and which, regarding DDD principle seem a bit tedious to accomplish.
Let's say, I've an application for hosting accounts management. Basically, the application compose several bounded contexts such as Web accounts management, Ftp accounts management, Mail accounts management... each of them represented by their own AR (they can live standalone).
Now, let's imagine I want to provide a UI with an HTML form that compose one fieldset for each bounded context, for instance to update limits and or features. How should I process exactly to update all AR without breaking single transaction per request principle? Can I create a kind of "outer" AR, let's say a ClientHostingProperties AR which would holds references to other AR and update them as part of single transaction, using own repository? Or should I better create an AR that emit messages to let's listeners provided by the bounded contexts react on, in which case, I should probably think about ES?
Thanks.

How should I process exactly to update all AR without breaking single transaction per request principle?
You are probably looking for a process manager.
Basic sketch: persisting the details from the submitted form is a transaction unto itself (you are offered an opportunity to accrue business value; step 1 is to capture that opportunity).
That gives you a way to keep track of whether or not this task is "done": you compare the changes in the task to the state of the system, and fire off commands (to run in isolated transactions) to make changes.
Processes, in my mind, end up looking a lot like state machines. These tasks are commands are done, these commands are not done, these commands have failed: now what? and eventually reach a state where there are no additional changes to be made, and this instance of the process is "done".

Short answer: You don't.
An aggregate is a transactional boundary, which means that if you would update multiple aggregates in one "action", you'd have to use multiple transactions. The reason for an aggregate to be equivalent to one transaction is that this allows you to guarantee consistency.
This means that you have two options:
You can make your aggregate larger. Then you can actually guarantee consistency, but your ability to handle concurrent requests gets worse. So this is usually what you want to avoid.
You can live with the fact that it's two transactions, which means you are eventually consistent. If so, you usually use something such as a process manager or a flow to handle updating multiple aggregates. In its simplest form, a flow is nothing but a simple if this event happens, run that command rule. In its more complex form, it has its own state.
Hope this helps 😊

CQRS read model projection - business logic

So, I trigger command on aggregate root and it has some 10 events happened as a result of the command. This events are internal ones, and since outer systems need aggregation of this events, I decided to make projection (read projection basically). In order to make this projection from 10 events (internal) TO 1 event (external), I have to apply some business rules (business rules concerning merging of events). Where should I put this rules, since it seems like part of domain but I'm creating projections of internal events?
Basically since projection logic is part of domain, should I keep it inside aggregate and call it in code where projection is made?
UPDATE
So, inside one aggregate root, I have e.g. 3 events (internal) as response to one Command (aggregate.createPaintandwashatsametime(id, red)) that is sent to aggregate root and that are spreading through all the aggregate root entities like: CarCreated(Id), CarSeatColored(Red), CarWashed() etc. (all this 3 events are happened because of single command). External system expects to receive one external event as CarMaintainenceDone(Id, repainted=true, washed=true, somevalue=22);
Now, if i have some complex logic to make this CarMaintainenceDone event (like if(color==red then in projection somevalue==22 otherwise 44) - should this go in projection code or be part of domain?
UPDATE 2
Let me try to give you new example. Just ignore how domain is modeled since this is just example:
As you can see we have AggregateRoot that contains Multiplier which is there just to call things with the right name. When we do multiplication we first send integer 1 to ObjectA which has some logic to set internal state and emit ObjectAHasSetParam event. The same thing goes with ObjectB. Finally, ObjectC listens to all of this events, and on paramsHasBeenSet will do actual multiplication.
In event store in this case I would preserve list of events:
[ObjectAHasSetParam , ObjectBHasSetParam , ObjectCHasMultiplied ]
My point here was: if I emit all of this events one by one out of process - the state that somebody else updates will possibly be inconsistent, since this 3 events make sense only together. That is why I wanted to make something like projection, but I think in this case I just need to publish list of this events together instead of event by event.
class AggregateRoot{
Multiplier ml;
void handle(MultiplyCommand(1,2)){
ml.multiply(1,2);
}
}
class Multiplier{
ObjectA a;
ObjectB b;
ObjectC res;
void multiply(1,2){
a.setParam(1);
b.setParam(2);
publish(paramsHaveBeenSet());
}
}
class ObjectA{
int p;
void setParam(1){
p = 1 + 11;
publish(ObjectAHasSetParam(12));
}
}
class ObjectB{
int p;
void setParam(2){
p = 2 + 22;
publish(ObjectBHasSetParam(24));
}
}
class ObjectC{
int p1; int p2;
int res;
listen(ObjectAHasSetParam e1){
p1 = e1.par;
}
listen(ObjectBHasSetParam e2){
p2 = e2.par;
}
listen(paramsHaveBeenSet e3){
res = p1 * p2;
publish(ObjectCHasMultiplied(288));
}
}

External system expects to receive one external event as CarMaintainenceDone(Id, repainted=true, washed=true, somevalue=22);
A ha! The short answer is process manager.
The longer answer is that you (should) have two aggregates right now. One of them is tracking the state of the car. The other is tracking the process of maintaining the car.
The big hint that there is another aggregate hidden somewhere: you've got this CarMaintenanceDone event, with no aggregate responsible for generating it. All events have an "aggregate" somewhere that produces them. The aggregate might be the real world, or a proxy for the real world (HttpRequestReceived), or a digital thing in some other bounded context; but the event is telling you that something, somewhere, changed state.
That is to say, you have some aggregate that knows the rule of when the maintenance is done. It's an information resource, a log of work. When CarWashed is published (by the Car, or the washing machine, or whatever), an event handler subscribed to the CarWashed event sends a command to the Maintenance aggregate to inform it. The Maintenance aggregate updates its own state, runs its logic, and publishes a MaintenanceCompleted event when all of the individual steps have been accounted for.
Most things that are process like can be implemented as Aggregates; the weird bit is that the "commands" tend to look like event handlers. But they have their own history (based on what they have observed), which describes how the state machine changed in response to each event observed.
It might be more than two, depending on the complexity of the processes.
Rinat Abdullin wrote a good introduction to process managers, that I reference frequently.
Isn't there a clear distinction between an aggregate and a process manager though? I thought process managers would only coordinate and live in the application service world, sending appropriate commands to aggregates based on the events received.
From what I've seen -- no, there isn't. The literature doesn't make that very clear.
For example, Udi Dahan wrote
Here’s the strongest indication I can give you to know that you’re doing CQRS correctly: Your aggregate roots are sagas.
Saga, here, being equivalent to a process.

There's often 2 event models, internal events (only visible within a BC) and external events (published to the outside world). You could decide to make everything external but then you have to version everything.
You can read more about internal vs external events in the Patterns, Principles, and Practices of Domain-Driven Design book p.408 (scroll up a bit in the link).
Projections shouldn't be responsible to publish external events. One common practice would be to register an internal event handler from the application service layer which is responsible for publishing external events on a messaging infrastructure. You could leverage that process to aggregate these events together and publish a single external event from them.
How the aggregation is performed would be up to you, but since internal events can be raised synchronously and handlers are usually single-threaded you can just setup a state machine in the handler that kicks-in when it receives the first event of the batch and aggregates them until it receives the last, then publish on the message bus.
If your messaging infrastructure cannot participate in the same transaction as your event store you could just have an additional process that reads the committed events in order and does the same thing as above.
An alternative would be to let the consumer deal with the aggregation. That could be the right approach if the consumer should be able to veto what "CarMaintenanceDone" means.
Finally, you could also publish an extra event from the aggregate itself. The event may not be leveraged by the AR itself, but sometimes it's better to just do what's more practical (just like enriching events with data only consumed by the read model). This approach would also have the advantage of not having to change the logic if more events are added.

There should not be a notion of a external event. Events are generated by the Aggregates and consumed by synchronous read-models, sagas or published to the outside world where other systems and microservices use them whatever they want.
So, in your case, the consumer (implemented as a saga for example) should aggregate those events by its business rules and then do something (a saga can create a new command for example) and not the Aggregate.
UPDATE (in response to question being updated)
If you think that car maintenance is a responsibility of the Car Aggregate, then Car aggregate should raise the event. It depends on how the future behavior of the Car Aggregate is influenced by that CarMaintainenceDone event. In this particular context, I would generate the event from the Car aggregate, to make code simpler.

How to avoid concurrency issues when scaling writes horizontally?

Assume there is a worker service that receives messages from a queue, reads the product with the specified Id from a document database, applies some manipulation logic based on the message, and finally writes the updated product back to the database (a).
This work can be safely done in parallel when dealing with different products, so we can scale horizontally (b). However, if more than one service instance works on the same product, we might end up with concurrency issues, or concurrency exceptions from the database, in which case we should apply some retry logic (and still the retry might fail again and so on).
Question: How do we avoid this? Is there a way I can ensure two instances are not working on the same product?
Example/Use case: An online store has a great sale on productA, productB and productC that ends in an hour and hundreds of customers are buying. For each purchase, a message is enqueued (productId, numberOfItems, price). Goal: How can we run three instances of our worker service and make sure that all messages for productA will end up in instanceA, productB to instanceB and productC to instanceC (resulting in no concurrency issues)?
Notes: My service is written in C#, hosted on Azure as a Worker Role, I use Azure Queues for messaging, and I'm thinking to use Mongo for storage. Also, the Entity IDs are GUID.
It's more about the technique/design, so if you use different tools to solve the problem I'm still interested.

Any solution attempting to divide the load upon different items in the same collection (like orders) are doomed to fail. The reason is that if you got a high rate of transactions flowing you'll have to start doing one of the following things:
let nodes to talk each other (hey guys, are anyone working with this?)
Divide the ID generation into segments (node a creates ID 1-1000, node B 1001-1999) etc and then just let them deal with their own segment
dynamically divide a collection into segments (and let each node handle a segment.
so what's wrong with those approaches?
The first approach is simply replicating transactions in a database. Unless you can spend a large amount of time optimizing the strategy it's better to rely on transactions.
The second two options will decrease performance as you have to dynamically route messages upon ids and also change the strategy at run-time to also include newly inserted messages. It will fail eventually.
Solutions
Here are two solutions that you can also combine.
Retry automatically
Instead you have an entry point somewhere that reads from the message queue.
In it you have something like this:
while (true)
{
var message = queue.Read();
Process(message);
}
What you could do instead to get very simple fault tolerance is to retry upon failure:
while (true)
{
for (i = 0; i < 3; i++)
{
try
{
var message = queue.Read();
Process(message);
break; //exit for loop
}
catch (Exception ex)
{
//log
//no throw = for loop runs the next attempt
}
}
}
You could of course just catch db exceptions (or rather transaction failures) to just replay those messages.
Micro services
I know, Micro service is a buzz word. But in this case it's a great solution. Instead of having a monolithic core which processes all messages, divide the application in smaller parts. Or in your case just deactivate the processing of certain types of messages.
If you have five nodes running your application you can make sure that Node A receives messages related to orders, node B receives messages related to shipping etc.
By doing so you can still horizontally scale your application, you get no conflicts and it requires little effort (a few more message queues and reconfigure each node).

For this kind of a thing I use blob leases. Basically, I create a blob with the ID of an entity in some known storage account. When worker 1 picks up the entity, it tries to acquire a lease on the blob (and create the blob itself, if it doesn't exist). If it is successful in doing both, then I allow the processing of the message to occur. Always release the lease afterwards.
If I am not successfull, I dump the message back onto the queue
I follow the apporach originally described by Steve Marx here http://blog.smarx.com/posts/managing-concurrency-in-windows-azure-with-leases although tweaked to use new Storage Libraries
Edit after comments:
If you have a potentially high rate of messages all talking to the same entity (as your commend implies), I would redesign your approach somewhere.. either entity structure, or messaging structure.
For example: consider CQRS design pattern and store changes from processing of every message independently. Whereby, product entity is now an aggregate of all changes done to the entity by various workers, sequentially re-applied and rehydrated into a single object

If you want to always have the database up to date and always consistent with the already processed units then you have several updates on the same mutable entity.
In order to comply with this you need to serialize the updates for the same entity. Either you do this by partitioning your data at producers, either you accumulate the events for the entity on the same queue, either you lock the entity in the worker using an distributed lock or a lock at the database level.
You could use an actor model (in java/scala world using akka) that is creating a message queue for each entity or group of entities that process them serially.
UPDATED
You can try an akka port to .net and here.
Here you can find a nice tutorial with samples about using akka in scala.
But for general principles you should search more about [actor model]. It has drawbacks nevertheless.
In the end pertains to partition your data and ability to create a unique specialized worker(that could be reused and/or restarted in case of failure) for a specific entity.

I assume you have a means to safely access the product queue across all worker services. Given that, one simple way to avoid conflict could be using global queues per product next to the main queue
// Queue[X] is the queue for product X
// QueueMain is the main queue
DoWork(ProductType X)
{
if (Queue[X].empty())
{
product = QueueMain().pop()
if (product.type != X)
{
Queue[product.type].push(product)
return;
}
}else
{
product = Queue[X].pop()
}
//process product...
}
The access to queues need to be atomic

You should use session enabled service bus queue for ordering and concurrency.

1) Every high scale data solution that I can think of has something built in to handle precisely this sort of conflict. The details will depend on your final choice for data storage. In the case of a traditional relational database, this comes baked in without any add'l work on your part. Refer to your chosen technology's documentation for appropriate detail.
2) Understand your data model and usage patterns. Design your datastore appropriately. Don't design for scale that you won't have. Optimize for your most common usage patterns.
3) Challenge your assumptions. Do you actually have to mutate the same entity very frequently from multiple roles? Sometimes the answer is yes, but often you can simply create a new entity that's similar to reflect the update. IE, take a journaling/logging approach instead of a single-entity approach. Ultimately high volumes of updates on a single entity will never scale.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string