How many event storages should we use by multiple bounded contexts? - domain-driven-design

I am currently reading about DDD and I did not manage to find answer to this question. If we have a large application with multiple bounded contexts, then as far as I know we should implement each BC as it were a separate application. Thus it is logical to come to the conclusion that each BC has its own UI and event storage. I previously thought that we have only a single event storage because it is the single source of truth according to some articles (about CQRS). The only problem with these statements that they lack of context. So is an event storage the single source of truth in a single bounded context or in the entire application?

"Is an ES the single source of truth in a bounded context or in entire application?"
I guess you meant system, because Bounded Context is an application in the simplest explanation.
"If we have a large application with multiple bounded contexts"
You can't have multiple bounded contexts within the same model. Bounded context limits model. So you should change term bounded context for subdomain and it would be correct.
Anyway answering your question. It depends.
Single Event Store for whole system
Pros
One place to manage
It is easy to see related events by CorrelationID
In some softwares no need for service discovery. All services (applications) can integrate via single ES (I am talking about true ES not data storage.)
Less cpu/memory needed
Cons
Single point of failure (of course you can scale it, to avoid such situation)
You're coupling services together (breaking microservice's rule)
Obligated to not change ES during system life time
One Event Store per application
Pros
No single point of failure
Deployed with application
No coupling between services. More autonomy
If application will be disabled ES can be disbaled with it
New services can work with new versions or even a diffrent ES
Cons
Additional databases to take care about and monitor
More cpu/ram consumed
Harder to manage correlationIDs, because they are splitted between multiple ES
Some service discovery needed. For subscribing to multiple ES or need for extra message queue

Related

Which project in a solution to add a Domain Service that spans two aggregates?

I currently have two projects within one Visual Studio solution. Each project represents a different aggregate. I need to add a domain service that interacts with the two aggregate roots. Which project should I add it to? Does it matter?
If your aggregate roots both belong to the same bounded context then your aggregate roots should probably be in the same project; else the domain service may be in another project that references the two aggregate root projects but that is going to become unwieldy quite quickly. A domain project per bounded context should suffice.
However, if the two aggregate roots are in separate bounded contexts then the "easiest" would be to use some form of messaging and have a process manager in an orchestration layer handle the interaction between the various bounded context endpoints. For this I usually have BC specific orchestration endpoints and BC specific "functional" endpoints where a functional endpoint handles BC specific functions. A BC specific orchestration endpoint, however, contains the BC specific process managers but typically interacts with other functional endpoints from whichever BC it requires a service to be fulfilled.
Imagine you're building an ecommerce app. In your case when you create a product, the data should be decomposed by ui micro controllers which belong to their bounded contexts (shipping, invoicing) (like hyenas) and send info they interested in to bounded context data storage for future fulfilling their capabilities. As a shipping isolated module, in order to calculate shipping cost, I need product weight, hence I should tear away that info when user inputs that data on UI.
https://www.youtube.com/watch?v=hev65ozmYPI - Check out this link.

DDD, CQRS/ES & MicroServices Should Decisions be taken on Microservice's views or aggregates?

So I'll explain the problem through the use of an example as it makes everything more concrete and hopefully will reduce ambiguity.
The Architecture is pretty simple
1 MicroService <=> 1 Aggregate <=> Transactional Boundry
Each microservice will be using CQRS/ES design pattern which implies
Each microservice will have its own Aggregate mapping the domain of a real-world problem
The state of the aggregate will be rebuilt from an event store
Each event will signify a state change within the aggregate and will be transmitted to any service interested in the change via a message broker
Each microservice will be transactional within its own domain
Each microservice will be eventually consistent with other domains
Each microservice will build there own view models, from events being emitted by other microservices
So the example lets say we have a banking system
current-account microservice is responsible for mapping the Customer Current Account ... Withdrawal, Deposits
rewards microservice will be responsible for inventory and stock take of any rewards being served by the bank
air-miles microservice will be responsible for monitoring all the transaction coming from the current-account and in doing so award the Customer with rewards, from our reward micro-service
So the problem is this Should the air-miles microservice take decisions based on its own view model which is being updated from events coming from the current-account, and similarly, on picking which reward it should give out to the Customer?
Drawbacks of taking decisions on local view models;
Replicating domain logic on how to maintain these views
Bugs within the view might propagate the wrong rewards to be given out
State changes (aka events emitted) on corrupted view models could have consequences in other services which are taking their own decisions on these events
Advantages of taking a decision on local view models;
The system doesn't need to constantly query the microservice owning the domain
The system should be faster and less resource intense
Or should it use the events coming from the service to trigger queries to the Aggregate owning the Domain, in doing so we accept the fact that view models might get corrupt but the final decision should always be consulted with the aggregate owning the domain?
Please, not that the above problem is simply my understanding of the architecture, and the aim of this post is to get different views on how one might use this architecture effectively in a microservice environment to keep each service decoupled yet avoid cascading corruption scenario without to much chatter between the service.
So the problem is this Should the air-miles microservice take decisions based on its own view model which is being updated from events coming from the current-account, and similarly, on picking which reward it should give out to the Customer?
Yes. In fact, you should revise your architecture and even create more microservices. What I mean is that, being a event-driven architecture (also an Event-sourced one), your microservices have two responsibilities: they need to keep two different models: the write model and the read model.
So, for each Aggregate should be a microservice that keeps only the write model, that is, it only processes Commands, without building also a read model.
Then, for each read/query use case you should have a microservice that build the perfect read model. This is required if you need to keep the Aggregate microservice clean (as you should) because in general, the read models needs data from multiple Aggregate types/bounded contexts. Read models may cross bounded context boundaries, Aggregates may not. So you see, you don't really have a choice if you need to fully respect DDD.
Some says that domain events should be hidden, only local to the owning microservice. I disagree. In an event-driven architecture the domain events are first class citizens, they are allowed to reach other microservices. This gives the other microservices the chance to build their own interpretation of the system state. Otherwise, the emitting microservice would have the impossible additional responsibility/task of building a state that must match every possible need that all the microservices would ever want(!); i.e. maybe a microservices would want to lookup a deleted remote entity's title, how could it do that if the emitting microservice keeps only the list of non-deleted-yet entities? You may say: but then it will keep all the entities, deleted or not. But maybe someone needs the date that an entity was deleted; you may say: but then I keep also the deletedDate. You see what you do? You break the Open/closed principle. Every time you create a microservice you need to modify the emitting microservice.
There is also the resilience of the microservices. In the Art of scalability, the authors speak about swimming lanes. They are a strategy to separate the components of a system into lanes of failures. A failure in a lane does not propagate to other lanes. Our microservices are lanes. Components in a lane are not allowed to access any component from other lane. One down microservice should not bring the others down. It's not a matter of speed/optimisation, it's a matter of resilience. The domain events are the perfect modality of keeping two remote systems synchronized. They also emphasize the fact that the data is eventually consistent; the events travel at a limited speed (from nanoseconds to even days). When a system is designed with that in mind then no other microservice can bring it down.
Yes, there will be some code duplication. And yes, although I said that you don't have a choice, you have. In order to reduce the code duplication at the cost of a lower resilience, you can have some Canonical read models that build a normal flat state and other microservices could query that. This is dangerous in most cases as it breaks the swimming lanes concept. Should the Canonical microservices go down, go down all dependent microservices. Canonical microservices works best for CRUD-like bounded context.
There are however valid cases when you may have some internal events that you don't want to expose. In other words, you are not required to publish all domain events.
So the problem is this Should the air-miles micro service take decisions based on its own view model which is being updated from events coming from the current-account, and similarly, on picking which reward it should give out to the Customer?
Each consumer uses a local replica of a representation computed by the producer.
So if air-miles needs information from current-account it should be looking at a local replica of a view calculated by the current-account service.
The key idea is this: micro services are supposed to be isolated from one another; you should be able to redesign and deploy one without impacting the others.
So try this thought experiment - suppose we had these three micro services, but all saving snapshots of current state, rather than events. Everything works, then imagine that the current-account maintainer discovers that an event sourced implementation would better serve the business.
Should the change to the current-account require a matching change in the air-miles service? If so, can we really claim that these services are isolated from one another?
Advantages of taking a decision on local view models
I don't particularly like these "advantages"; first, they are dominated by the performance axis (please recall that the second rule of performance optimization is "not yet"). And second, that they assume that the service boundaries are correctly drawn; maybe the performance issue is evidence that the separation of responsibilities needs review.

DDD/CQRS/ES Implement aggregate member using graph database aka using an immediately consistent readModel as entity collection

Abstract
I am modelling a generic authorization subdomain for my application. The requirements are quite complicated as it needs to cope with multi tenants, hierarchical organisation structure, resource groups, user groups, permissions, user-editable permissions and so on. It's a mixture of RBAC (users assigned to roles, roles having permissions, permissions can execute commands) with claims-based auth.
Problem
When checking for business rule invariants, I have to traverse the permission "graph" to find a permission for a user to execute a command on a resource in an environment. The traversal depth is arbitrary, on multiple dimensions.
I could model this using code, but it would be best represented using a graph database as queries/updates on this aggregate would be faster. Also, it would reduce the complexity of the code itself. But this would require the graph database to be immediately consistent.
Still, I need to use CQRS/ES, and enable a distributed architecture.
So the graph database needs to be
Immediately consistent
And this introduces some drawbacks
When loading events from event-store, we have to reconstruct the graph database each time
Or, we have to introduce some kind of graph database snapshotting
Overhead when communicating with the graph database
But it has advantages
Reduced complexity of performing complex queries
Complex queries are resolved faster than with code
The graph database is perfect for this job
Why this question?
In other aggregates I modelled, I often have a EntityList instance or EntityHierarchy instance. They basically are ordered/hierarchical collection of sub-entities. Their implementation is arbitrary. They can support anything from indexing, key-value pairs, dynamic arrays, etc. As long as they implement the interfaces I declared for them. I often even have methods like findById() or findByName() on those entities (lists). Those methods are similar to methods that could be executed on a database, but they're executed in-memory.
Thus, why not have an implementation of such a list that could be bound to a database? For example, instead of having a TMemoryEntityList, I would have a TMySQLEntityList. In the case at hand, perhaps having an implementation of a TGraphAuthorizationScheme that would live inside a TOrgAuthPolicy aggregate would be desirable. As long as it behaves like a collection and that it's iterable and support the defined interfaces.
I'm building my application with JavaScript on Node.js. There is an in-memory implementation of this called LevelGraph. Maybe I could use that as well. But let's continue.
Proposal
I know that in DDD terms the infrastructure should not leak into the domain. That's what I'm trying to prevent. That's also one of the reasons I asked this question, is that it's the first time I encounter such a technical need, and I am asking people who are used to cope with this kind of problem for some advice.
The interface for the collection is IAuthorizationScheme. The implementation has to support deep traversal, authorization finding, etc. This is the interface I am thinking about implementing by supporting it with a graph database.
Sequence :
1 When a user asks to execute a command I first authenticate him. I find his organisation, and ask the OrgAuthPolicyRepository to load up his organisation's corresponding OrgAuthPolicy.
The OrgAuthPolicyRepository loads the events from the EventStore.
The OrgAuthPolicyRepository creates a new OrgAuthPolicy, with a dependency-injected TGraphAuthorizationScheme instance.
The OrgAuthPolicyRepository applies all previous events to the OrgAuthPolicy, which in turns call queries on the graph database to sync states of the GraphDatabase with the aggregate.
The command handler executes the business rule validation checks. Some of them might include checks with the aggregate's IAuthorizationScheme.
The business rules have been validated, and a domain event is dispatched.
The aggregate handles this event, and applies it to itself. This might include changes to the IAuthorizationScheme.
The eventBus dispatched the event to all listening eventHandlers on the read-side.
Example :
In resume
Is it conceivable/desirable to implement entities using external databases (ex. Graph Database) so that their implementation be easier? If yes, are there examples of such implementation, or guidelines? If not, what are the drawbacks of using such a technique?
To solve your task I would consider the following variants going from top to bottom:
Reduce task complexity by employing security frameworks or identity
management solutions. Some existent out of the box identity management solution might do the job. If it doesn't take a look on the frameworks to help you implement your own. Unfortunately I'm poorly familiar with Node.js world to advice
you any. In Java world that could be Apache Shiro or Spring Security. This could be a good option from both costs and security perspective
Maintain single model instead of CQRS. This eliminates consistency problems (if you will decide to have separate
resources to store your models). From my understanding
permissions should not be changed frequently but they will be accessed
frequently. This means you can live with one model optimised for
reads, avoiding consistency issues and maintaining 2 models. To
track down user behaviour you can implement auditing separately.
From my experience security auditing can require some additional
data which most likely is not in your data model.
Do it with CQRS. And here I would first consider revisit requirements to find a way to accept eventual consistency instead of strong consistency. This opens many options for implementation.
Regarding the question should you use introduce dedicated Graph Database it's impossible to answer without knowledge of your domain, budget, desired system throughput and performance, existent infrastructure, team knowledge and setup etc. You need to estimate costs of the solution with dedicated Graph Database and without it. My filling is that unless permission management is main idea of your project or your project is mature enough (by number of users and R&D capacities) dedicated database is unlikely to pay back it's costs for your task.
To understand what could be benefits of having dedicated Graph Database your existent storage solutions should be taken in opposite. These 2 articles explains pretty well what could be such benefits:
http://neo4j.com/developer/graph-db-vs-nosql/
http://neo4j.com/developer/graph-db-vs-rdbms/

Understanding when to use stateful services and when to rely on external persistence in Azure Service Fabric

I'm spending my evenings evaluating Azure Service Fabric as a replacement for our current WebApps/CloudServices stack, and feel a little bit unsure about how to decide when services/actors with state should be stateful actors, and when they should be stateless actors with externally persisted state (Azure SQL, Azure Storage and DocumentDB). I know this is a fairly new product (to the general public at least), so there's probably not a lot of best practices in regards to this yet, but I've read through most of the documentation made available by Microsoft without finding a definite answer for this.
The current problem domain I'm approaching is our event store; parts of our applications are based on event sourcing and CQRS, and I'm evaluating how to move this event store over to the Service Fabric platform. The event store is going to contain a lot time series-data, and as it's our only source of truth for the data being persisted there it must be consistent, replicated and stored to some form of durable storage.
One way I have considered doing this is with stateful "EventStream" actor; each instance of an aggregate using event sourcing stores its events within an isolated stream. This means the stateful actor could keep track of all the events for its own stream, and I'd have met my requirements as to how the data is stored (transactional, replicated and durable). However, some streams may grow very large (hundreds of thousands, if not millions, of events), and this is where I'm starting to get unsure. Having an actor with a large amount of state will, I imagine, have impacts on the performance of the system when these large data models needs to be serialized to or deserialized from disk.
Another option is to keep these actors stateless, and have them just read their data from some external storage like Azure SQL - or just go with stateless services instead of actors.
Basically, when is the amount of state for an actor/service "too much" and you should start considering other ways of handling state?
Also, this section in the Service Fabric Actors design pattern: Some anti-patterns documentation leave me a little bit puzzled:
Treat Azure Service Fabric Actors as a transactional system. Azure Service Fabric Actors is not a two phase commit-based system offering ACID. If we do not implement the optional persistence, and the machine the actor is running on dies, its current state will go with it. The actor will be coming up on another node very fast, but unless we have implemented the backing persistence, the state will be gone. However, between leveraging retries, duplicate filtering, and/or idempotent design, you can achieve a high level of reliability and consistency.
What does "if we do not implement the optional persistance" indicate here? I was under the impression that as long as your transaction modifying the state succeeded, your data was persisted to durable storage and replicated to at least a subset of the replicas. This paragraph leaves me wondering if there are situations where state within my actors/services will get lost, and if this is something I need to handle myself. The impression I got from the stateful model in other parts of the documentation seems to counteract this statement.
One option that you have is to keep 'some' of the state in the actor (let's say what could be considered to be hot data that needs to be quickly available) and store everything else on a 'traditional' storage infrastructure such as SQL Azure, DocDB, ....
It is difficult to have a general rule about too much local state but, maybe, it helps to think about hot vs. cold data.
Reliable Actors also offer the ability to customize the StateProvider so you can also consider implementing a customized StateProvider (by implementing the IActorStateProvider) with the specific policies that you need to be more efficient with the requirements that you have in terms of amount of data, latency, reliability and so on (note: documentation is still very minimal on the StateProvider interface but we can publish some sample code if this is something you want to pursue).
About the anti-patterns: the note is more about implementing transactions across multiple actors. Reliable Actors provides full guarantee on reliability of the data within the boundaries of an actor. Because of the distributed and loosly coupled nature of the Actor model, implementing transactions that involve multiple actors is not a trivial task. If 'distributed' transactions is a strong requirement, the Reliable Services programming model is probably a better fit.
I know this has been answered, but recently found myself in the same predicament with a CQRS/ES system and here's how I went about it:
Each Aggregate was an actor with only the current state stored in it.
On a command, the aggregate would effect a state change and raise an event.
Events themselves were stored in a DocDb.
On activation, AggregateActor instances read events from DocDb if available to recreate its state. This is obviously only performed once per actor activation. This took care of the case where an actor instance is migrated from one node to another.
To answer #Trond's sedcondary question which is, "What does, 'if we do not implement the optional persistance' indicate here?"
An actor is always a stateful service, and its state can be configured, using an attribute on the actor class, to operate in one of three modes:
Persisted. The state is replicated to all replica instances, and it
also written to disk. This the state is maintained even if all
replicas are shut down.
Volatile. The state is replicated to all
replica instances, in memory only. This means as long as one replica
instance is alive the state is maintained. But when all replicas are
shut down the state is lost and cannot be recovered after they are
restarted.
No persistence. The state is not replicated to other
replica instances, nor to disk. This provides the least state
protection.
A full discussion of the topic can be found in the Microsoft documentation

Separation of concerns in Node.js app and dealing with load across different processes

I have a Node application which persists data to a MongoDB database. Most of this data is in hand, such as data for the User collection. However, the application also has the concept of Website collection, and for this collection, data must first be downloaded from somewhere before it is saved.
I am wondering how I should separate the above concerns in my application. At the service layer, I have things like User and Website. They provide basic CRUD operations. At completely the opposite end of the spectrum, there is a user interface whereby uses can input a website URL. Somewhere between this UI and the application persisting the data to MongoDB (the service layer), the application must make a request to this URL to gather some data. Once the data has been fetched, the Website service will persist it.
Potentially, there could be thousands of these URLs entered at once, and I do not want to bring down the Node process that handles the web server due to load issues. Therefore I think it would be a good idea to abstract the work out to a different process and use some sort of messaging bus to tie the application together.
It seems that you've decomposed system correctly -and have created that separation at the persistence "service" layer-, but I'd take this separation a bit further by moving toward a distributed system architecture (i.e. SOA / micro-services).
The initial step of building a distributed system is identifying each of the functions necessary to meet the overall business goal of the application and mapping these to service endpoints. Each loosely coupled service endpoint will then serve a small isolated job/function and it will act as an abstraction for that business goal.
By continuing the separation of responsibilities all the way to the service endpoint you create small independent boundaries for scalability, throughput, fault tolerance, security, deployment, etc.
For example -RESTfully speaking-, this might mean service endpoints for both Users (e.g. /users/{userid}) and Websites (e.g. /websites/{websiteid|url})... and perhaps an additional Resource to maintain the relationship/link between the two (e.g. /users/{userid}/userwebsites : {websiteid:1234,url:blah.com).
This separation would mean you can handle the website processing responsibility independently, which would have a number of benefits -beyond just handling the different load characteristics-.

Resources