DDD. Shared kernel? Or pure event-driven microservices?

DDD. Shared kernel? Or pure event-driven microservices? - domain-driven-design

I'm breaking my system into (at least) two bounded-contexts: study-design and survey-planning.
There's a concept named "subject" (potential subject for interviewing) in the study-design context. We also maintain associations between subjects and populations in that domain.
Now, in the survey-planning, we also need (some) information about the subject (for example: for planning a visit, or even for anticipated selection of questionnaire, in case the population the subject belongs to is known beforehand).
So, I need that "subject" in both contexts.
What approach should I pick? Having a shared kernel, as explained in Eric Evans DDD book? I don't mind (at least for now) having the two contexts sharing the same database.
Or... should I go pure microservice? Meaning: those two can't / shouldn't share database..., and in that case I might have to go the mirroring / duplicating route through event passing: https://www.infoq.com/news/2014/11/sharing-data-bounded-contexts
Any thoughts on which one is better, for the above situation?
Thanks!

The context for microservices is distributed systems. In any other situation it would probably be overkill. Shared kernel will eventually split. That is usually the case. You may start from it. Nothing wrong with that. However, it will not stay there.

I recommend that you choose a event-driven solution, but not necessarily to use microservices. You could build an event-driven monolith in order to spend much less time on synchronizing the two models. When the application grows too big then you split the monolith into microservices. You could use CQRS to split event more the models into write and read. If you use event-sourcing things get even more interesting.
In my experience, with shared kernel, the models become god objects, one-size-fits-all kind of objects.

In my opinion, you have three entities:
study
survey
person
It is pretty intuitive to see that each of these is its own aggregate root. So then we are talking about inter-root relationships. In my experience, those are meaningful entities on their own, and cleanest and most future proof by far is to treat those relationships as independent aggregate roots.
The relationship between a study and a person is perhaps called TestSubject, and the relationship between a person and a survey could be called Interviewee or something similar. In another context, the person could be an employee for a company, and then the Employee would be its own aggregate root. Information that only relates to the relationship and not to the person or the study say, should be limited to this relationship specific aggregate root. This could for instance be the start date at which the subject started to take part in the test, and the end date (when he dropped out, if he or she dropped out prematurely, etc.)
As for storage, all aggregate roots should define their own separate repositories as interfaces and know only those interfaces, but the implementation of those interfaces is free to choose to use the same database or different ones, or even different kinds, local or distributed, etc. So this holds for these 'relational' aggregate roots as well. But you should almost force yourself to use different databases and preferably even different technologies (e.g. one EntityFramework, the other MongoDb) when you start with this, to force yourself to make sure your interfaces are properly defined and independent of implementation.
And yes, big fan of CQRS as well here, and Event/Command Sourcing as well. There are many light-weight implementations possible that allow you to upscale, but are very easy to get into and afford you almost completely linear (=predictable) complexity.

You can start with microservices that share a monolithic data source, but only use partial domain entities and value objects

Related

Can I say Axon Commands and Events are considered as anemic models?

My question here is quite straight as mentioned in the subject.
However, please allow me to give some brief explanation here about my innocent thoughts.
I've been using Axon for approximately 10 months now. I used to design my project structure based on the Hexagonal architecture with two top level packages respectively for domain and infrastructure.
Furthermore, domain package will contain different domain objects (as explained in the DDD concept) such as follow:
Aggregate (this will be an Axon aggregate class).
Repository (in my case, this will be a Spring Data Repository interface).
Entity (in my case, this contains any lookup entity that i used for set-based consistency validation as written here).
Service Port (collection of Input and Ouput port interfaces).
Commands (representing Axon Command object).
As for Events, I used to put them on a different module that I compiled as a jar file, so I can share it to other developers whom going to use the same event in their project.
I've noticed recently that all of my commands and events were basically anemic models (an anti pattern that we should avoid).
Is there any good practice on this ? Or, Is it something that intentionally used by design ?
I've been thinking to put my Command classes within my Aggregate class (as an inner classes). At least by using this approach I won't end-up with having so many anemic models scattered outside. Any thoughts ?

Commands are designed to be behavior and input structures mirroring the external world. They don't necessarily mirror an aggregate's structure.
They are not even connected clearly to one single aggregate, at times. Enclosing them within aggregates can be a code smell because you are then thinking in terms of resources and UI organization, instead of transaction boundaries and entity groups.
You are also violating the open-closed principle. Changes in volatile layers like user interface and request structures will make you edit the Aggregate class, and that is not good design.
On a more general note...
At times, this debate of anemic vs. non-anemic (or dry vs. non-dry) can push you in the direction of premature - and incorrect - optimization. Try avoiding this trap because you will end up optimising at the code level, but your domain will suffer.
DDD and CQRS guidelines align with principles that help you keep complexity at bay over the long term. Things kept distinct and separate help you achieve this.

First of all, in DDD, your domain had to be free of any frameworks, just use pure language library.
Then, mixing Commands and Aggregates cannot be a good solution. I think Commands belongs to Port while Aggregates belongs to the Hexagone.
Finally, DDD highlights the discovery of the domain thanks to the experts. Did you do that ? If not, if you're only using the Tacticts pattern, you'll miss one of the most important part of DDD.

Definition of a set of aggregates?

I'm struggling with how/if to define "a set of aggregates". Aggregates are supposed to be stand alone and isolated but it's easy to think of a bigger set of aggregates that belong together. But is this a trap?
Using this "set of aggregates" it would be possible to for instance enumerate and index aggregates on a unique property within the set and have other domain rules that could be validated across all aggregates in the set. It's tempting but also feels a bit wrong.
Another approach would be to avoid this thinking completely and not allow/define a set of aggregates and not allow enumerating aggregates but only load/save on aggregate-id. Using this option if would be necessary to reference aggregates from other aggregates and by doing this build up an interconnected graph of aggregates.
The approaches are similar to having aggregates in a folder on disk or having an "internet" of aggregates where the references between them are defining the bigger set of aggregates. In any case I'm really stuck on this problem. I have never read anywhere about this and I guess nobody really cares that much? I'm not sure I explain this very good but my question is if there are any definitions of the "set of aggregates" or if we should think of aggregates as totally isolated/on its own and with only a unique aggregate-id (UUID)?
The set of aggregates could for instance be the database being used under the surface. But what I'm wondering is if this database as in the information about what aggregates it contains has any definition in DDD or if we should think about a set of aggregates as an interconnected graph where only traversal of this graph can be used to enumerate all "associated" aggregates.

Aggregates are connected
In any application with sufficient complexity, Aggregates end up referencing one-another. And it is perfectly reasonable to use their unique identifiers as reference IDs to refer to each other.
But take care to load and persist aggregates outside the domain layer, typically in repositories. If you want to traverse links across aggregates and load them into memory, you will be doing that upfront before handing over control to the domain layer for the actual processing.
Traversing the graph to get all related aggregates is correct, but this rarely spans across too many aggregate boundaries. You rarely find a single change or rule to be applied throughout the application. If you do have such a transaction, it is probably a sign of the domain design needing improvement, simply because you are spreading one responsibility/change amongst many aggregates.
The connectivity is so usual that you should watch out for aggregates that have no linkages with the rest of the system. They are either standalone libraries, or they probably belong to a different bounded context.
Aggregates can morph into different forms
They are aggregates because they form a clear invariant boundary, with their primary responsibility being to enforce invariants across state changes for all the entities within themselves. But they can morph into different kinds of DDD objects based on the requirement.
A good example is of a single Currency note. In most applications, they are value objects. But for the federal bank, they are aggregates with clear cut invariant rules. They are aggregates when they are created and referenced, but in a transaction that ships printed notes to banks, they may become value objects.
So you may have to evaluate whether you are talking about a domain entity in its aggregate form, or as a value object when you consider each linkage.
Aggregates are invariant boundaries
It is wrong to validate domain rules across aggregates.
Your aggregate boundary is an invariant boundary, meaning all the domain rules within it should be satisfied at all time. By that logic, you are going to incorrectly build up a structure that will need to ensure that all domain rules across aggregates are valid at all time. Doing so will impose considerable performance burden, not to mention the complexity in business logic.
But this is not to say that there may be domain rules that span across aggregates. The correct way to accomplish this would be using eventual consistency and an Event-driven approach.
The primary changing aggregate would validate and persist the data, and bubble up an event containing the state change. Other aggregates would then act on the event and bring themselves up-to-date. If an aggregate's domain rules break because of the change, there is usually a supplementary mechanism that allows correction of the problem (a preferred mechanism) or a rollback of the first state change (happens very rarely).

Perhaps you can find a common denominator the agg sets have in common and use that to work with?
A simplified example; there is a set of Books and a set of Users that have nothing in common except you want to know whenever they were first registered? What might be an option is to have an interface FirstRegistration and then you can choose to either expand Books/Users or create a specific entity instead.

I'm struggling with how/if to define "a set of aggregates". Aggregates
are supposed to be stand alone and isolated but it's easy to think of
a bigger set of aggregates that belong together. But is this a trap?
I think you're struggling because indeed the idea of a set of aggregates (instances) is very generic, and the uses of such things are contextual and domain-specific. People don't talk specifically about it because of course you may have behaviors that operate on a collection of multiple aggregates, but that doesn't give such collections any particular common properties or requirements that would allow you, from a general DDD perspective, to characterize such collections more specifically than "a set of aggregates", "a list of distinct aggregates", or similar.
Using this "set of aggregates" it would be possible to for instance
enumerate and index aggregates on a unique property within the set and
have other domain rules that could be validated across all aggregates
in the set. It's tempting but also feels a bit wrong.
Tempting why? You've couched the question in very abstract terms, so it's pretty much impossible to contradict you about the "it would be possible", but just because something may be possible doesn't mean it would be useful. In practice, I think you'll find that rules or behaviors that operate on collections of aggregates most naturally belong not to collections of aggregates in an abstract sense, but rather to other aggregate types in your domain model, to domain repositories, or to domain services.
It is entirely plausible that your domain model might want to handle particular sets of aggregates characterized by some rule. For example, if you're an airline, then one of the aggregates in your domain model might a single seat on a flight, since that's the unit you sell. It makes sense in that case that there would be operations on all the seats on a particular flight, for example, but whatever rules and behaviors you might have about that are specifically about that kind of aggregate, selected in that particular way.
Another approach would be to avoid this thinking completely and not
allow/define a set of aggregates and not allow enumerating aggregates
but only load/save on aggregate-id.
It's surely counterproductive to forbid working with sets of aggregates. Just don't attribute more significance to it than is warranted. There is nothing particularly special about sets of aggregates in general.
Using this option if would be
necessary to reference aggregates from other aggregates and by doing
this build up an interconnected graph of aggregates.
I don't follow that. One certainly must be able to retrieve and store individual aggregates from persistence, as that's more or less the defining property of aggregates -- they are the unit of persistence. But that doesn't mean that you must reject the ability to work with collections of aggregates. However, sets of aggregates do not have identity in the same way that individual aggregates do, so yes, relationships between aggregates need to be modeled in terms of individual aggregates. Nevertheless, that does not inherently preclude 1:m or n:m relationships among aggregates.
I'm really stuck on this problem. I have never read anywhere about this and I guess nobody really cares that much?
You'll find all sorts of uses of various sets of aggregates in applications built and maintained based on DDD ideas, but there's not much to talk about at the level of abstraction of your question, and what there is is already summed up in the words "set" and "aggregate".
The set of aggregates could for instance be the database being used
under the surface. But what I'm wondering is if this database as in
the information about what aggregates it contains has any definition
in DDD
Not to my knowledge. I suspect most DDD practitioners would just call it "the data", or something similar.
or if we should think about a set of aggregates as an
interconnected graph where only traversal of this graph can be used to
enumerate all "associated" aggregates.
I'm still not seeing why you set that up as a thing. Sure, depending on the domain model, you might be able to traverse all or substantial chunks of the data by traversing associations between aggregates, and that might be appropriate for some purposes, but DDD doesn't have to give a special name or special rules for sets of aggregates for you to work with them.
Like any useful methodology, DDD exists to solve problems. Its bread & butter is complex applications with complex data and evolving requirements. It is not to be interpreted as a straight jacket preventing designers and developers from (thoughtfully) writing designs and code that incorporate aspects of other design approaches, much less designs and code that provide for the application's idiosyncratic needs.

microservice shared domain layer

I have a doubt about Microservices Architecture. We are developing an ERP and there're several microservices such as Human Resources, Identity, Orders and so on.
We've implemented a shared domain layer for entities that are common for all those layers, including abstractions ( interfaces ) of Company, Location and some value objects.
My question is: What's the boundary of shared items for microservices and how bad is that?
In that case, Those shared entities would be the same for each microservice, so that help us to write less code BUT at the same time creates a small level of coupling.

Usually microservice architectures adopt a "share nothing" concept, which mean your code bases should be ideally separate. Yes, that will mean you will write more code but will keep your microservices more manageable, uncoupled and probably lighter.
Also, regarding the DDD-part do the question, you should really strive to keep well defined boundaries within your application, which means you shouldn't be scared to have "redundant" entities in different bounded contexts because the same concept usually mean different things to different domain areas of your application.
Keeping onto the "ERP" theme, you'd expect the "Order Placing" context of your application to have quite a different view on the "Product" entity than that of the "Tax" context. Keeping those in distinct contexts in different code bases will allow you to model smaller aggregates with a higher level of cohesion that will be way less coupled to the other constructs of your model thus, making evolve your microservices way easier.

My question is: What's the boundary of shared items for microservices and how bad is that?
Up until a few years ago it was complicated to get the boundaries a microservice defined because there was simply no agreement on how to archieve that, but Evans sorted that out a few years ago:
GOTO 2015 • DDD & Microservices: At Last, Some Boundaries! • Eric Evans
Microservices also follow the four tenants of SOA and the same 9 fallacies of distributed system are to take in consideration nevertheless their business scopes are different. Bear in mind that a microservice architecture should follow a Shared-nothing sort of architecture, so services don't really share entities, what they do is subscribe to messages, typically in a bus, and store local copies of the pieces of data they are interested in. This obviously introduce another concept called eventual consistency and depending on your business requirements,that might or might not if in your overall design.

data access in DDD?

After reading Evan's and Nilsson's books I am still not sure how to manage Data access in a domain driven project. Should the CRUD methods be part of the repositories, i.e. OrderRepository.GetOrdersByCustomer(customer) or should they be part of the entities: Customer.GetOrders(). The latter approach seems more OO, but it will distribute Data Access for a single entity type among multiple objects, i.e. Customer.GetOrders(), Invoice.GetOrders(), ShipmentBatch.GetOrders() ,etc. What about Inserting and updating?

CRUD-ish methods should be part of the Repository...ish. But I think you should ask why you have a bunch of CRUD methods. What do they really do? What are they really for? If you actually call out the data access patterns your application uses I think it makes the repository a lot more useful and keeps you from having to do shotgun surgery when certain types of changes happen to your domain.
CustomerRepo.GetThoseWhoHaventPaidTheirBill()
// or
GetCustomer(new HaventPaidBillSpecification())
// is better than
foreach (var customer in GetCustomer()) {
/* logic leaking all over the floor */
}
"Save" type methods should also be part of the repository.
If you have aggregate roots, this keeps you from having a Repository explosion, or having logic spread out all over: You don't have 4 x # of entities data access patterns, just the ones you actually use on the aggregate roots.
That's my $.02.

DDD usually prefers the repository pattern over the active record pattern you hint at with Customer.Save.
One downside in the Active Record model is that it pretty much presumes a single persistence model, barring some particularly intrusive code (in most languages).
The repository interface is defined in the domain layer, but doesn't know whether your data is stored in a database or not. With the repository pattern, I can create an InMemoryRepository so that I can test domain logic in isolation, and use dependency injection in the application to have the service layer instantiate a SqlRepository, for example.
To many people, having a special repository just for testing sounds goofy, but if you use the repository model, you may find that you don't really need a database for your particular application; sometimes a simple FileRepository will do the trick. Wedding to yourself to a database before you know you need it is potentially limiting. Even if a database is necessary, it's a lot faster to run tests against an InMemoryRepository.
If you don't have much in the way of domain logic, you probably don't need DDD. ActiveRecord is quite suitable for a lot of problems, especially if you have mostly data and just a little bit of logic.

Let's step back for a second. Evans recommends that repositories return aggregate roots and not just entities. So assuming that your Customer is an aggregate root that includes Orders, then when you fetched the customer from its repository, the orders came along with it. You would access the orders by navigating the relationship from Customer to Orders.
customer.Orders;
So to answer your question, CRUD operations are present on aggregate root repositories.
CustomerRepository.Add(customer);
CustomerRepository.Get(customerID);
CustomerRepository.Save(customer);
CustomerRepository.Delete(customer);

I've done it both ways you are talking about, My preferred approach now is the persistent ignorant (or PONO -- Plain Ole' .Net Object) method where your domain classes are only worried about being domain classes. They do not know anything about how they are persisted or even if they are persisted. Of course you have to be pragmatic about this at times and allow for things such as an Id (but even then I just use a layer super type which has the Id so I can have a single point where things like default value live)
The main reason for this is that I strive to follow the principle of Single Responsibility. By following this principle I've found my code much more testable and maintainable. It's also much easier to make changes when they are needed since I only have one thing to think about.
One thing to be watchful of is the method bloat that repositories can suffer from. GetOrderbyCustomer.. GetAllOrders.. GetOrders30DaysOld.. etc etc. One good solution to this problem is to look at the Query Object pattern. And then your repositories can just take in a query object to execute.
I'd also strongly recommend looking into something like NHibernate. It includes a lot of the concepts that make Repositories so useful (Identity Map, Cache, Query objects..)

Even in a DDD, I would keep Data Access classes and routines separate from Entities.
Reasons are,
Testability improves
Separation of concerns and Modular design
More maintainable in the long run, as you add entities, routines
I am no expert, just my opinion.

The annoying thing with Nilsson's Applying DDD&P is that he always starts with "I wouldn't do that in a real-world-application but..." and then his example follows. Back to the topic: I think OrderRepository.GetOrdersByCustomer(customer) is the way to go, but there is also a discussion on the ALT.Net Mailing list (http://tech.groups.yahoo.com/group/altdotnet/) about DDD.

Am I allowed to have "incomplete" aggregates in DDD?

DDD states that you should only ever access entities through their aggregate root. So say for instance that you have an aggregate root X which potentially has a lot of child Y entities. Now, for some scenario, you only really care about a subset of these Y entities at a time (maybe you're displaying them in a paged list or whatever).
Is it OK to implement a repository then, so that in such scenarios it returns an incomplete aggregate? Ie. an X object who'se Ys collection only contains the Y instances we're interested in and not all of them? This could for instance cause methods on X which perform some calculation involving the Ys to not behave as expected.
Is this perhaps an indication that the Y entity in question should be considered promoted to an aggregate root?
My current idea (in C#) is to leverage the delayed execution of LINQ, so that my X object has an IQueryable to represent its relationship with Y. This way, I can have transparent lazy loading with filtering... But getting this to work with an ORM (Linq to Sql in my case) might be a bit tricky.
Any other clever ideas?

I consider an aggregate root with a lot of child entities to be a code smell, or a DDD smell if you will. :-) Generally I look at two options.
Split your aggregate into many smaller aggregates. This means that my original design was not optimal and I need to identify some new entities.
Split your domain into multiple bounded contexts. This means that there are specific sets of scenarios that use a common subset of the entities in the aggregate, while there are other sets of scenarios that use a different subset.

Jimmy Nilsson hints in his book that instead of reading a complete aggregate you can read a snapshot of parts of it. But you are not supposed to be able to save changes in the snapshot classes to the database.
Jimmy Nilsson's book Chapter 6: Preparing for infrastructure - Querying. Page 226.
Snapshot pattern

You're really asking two overlapping questions.
The title and first half of your question are philosophical/theoretical. I think the reason for accessing entities only through their "aggregate root" is to abstract away the kinds of implementation details you're describing. Access through the aggregate root is a way to reduce complexity by having a trusted point of access. You're eliminating friction/ambiguity/uncertainty by adhering to a convention. It doesn't matter how it's implemented within the root, you just know that when you ask for an entity it will be there. I don't think this perspective rules out a "filtered repository" as you describe. But to provide a pit of success for devs to fall into, it should be impossible instantiate the repository without being explicit about its "filteredness;" likewise, if shared access to a repository instance is possible, the "filteredness" should be explicit when coding in the caller.
The second half of your question is about implementation on a specific platform. Not sure why you mention delayed execution, I think that's really orthogonal to the filtering question. The filtering itself could be a bit tricky to implement with LINQ. Maybe rather than inlining the Where lambdas, you set up a collection of them and select one depending on the filter you need.

You are allowed since the code will compile anyway, but if you're going for a pure DDD design you should not have incomplete instances of objects.
You should look into LazyLoading if you're afraid to load a huge object of which you will only use a small portion of its child entities.
LazyLoading delays the loading of whatever you decide to lazy-load until the moment they are accessed. They make use of callbacks to call the loading method once the code calls for them.

Is it OK to implement a repository then, so that in such scenarios it
returns an incomplete aggregate?
Not at all. Aggregate is a transnational boundary to change the state of your system. Never use aggregates for querying data. Split the system into Write and Read sides. (read about CQR & CQRS). When we think "CRUD" based, we implement our system, based on some resource. Lets say you have "Appointment" aggregate. Thinking "Crudish" means we should implement usecases Create, Update, Delete, GetAll appointments. That means Appointment[] should be returned for GetAll. When you think usecase based, (HexagonalArchitecture) your usecases would be ScheduleAppointment, RescheduleAppointment, CancelAppointment. But for query side it can be: /myCalendar. We return back all appointments for a specific user in a ClientCalendar object. Create separate DTO's for Query sides. Never use aggregates for this purpose.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string