Event Sourcing - Event Store - domain-driven-design

I am trying to understand the DDD / Event-sourcing / CQRS etc.
Lets consider an e-comm application with below Microservices.
order-service
shipping-service
payment-service
Can you clarify these questions?
We can relate domain as a large application and bounded-context as an individual microservice, rt?
Will each bounded-context/Microservice maintain its own event-store? (Basically 1 domain can have multiple event-sotre?)
If it is going to be 1 event-store per domain, who takes the ownership of event-store?

Typically, a (logical) service will have exclusive authority to modify one or more streams.
Whether those streams are all together in a single durable store, or distributed across multiple stores, isn't particularly important so long as the service knows how to find the streams.
Similarly, it's not typically all that important that each service has its own store. Functionally, the important thing is that the different services not write to streams that are outside of their jurisdiction. So long as you can be confident that two services won't be trying to use the same stream identifier, it should be fine.
Note that both of these guides are the same that you would use if your services were writing rows into tables in an RDBMS. Tables don't have to be in the same database, so long as the service knows which database holds which tables. Similarly, two different services can share the same database so long as they don't write into each other's tables.
There are, of course, non functional reasons that you might want the storage for different services to be isolated. For instance, if one service wants to upgrade to a new version of storage, while another needs to lag behind, then it will be a lot more convenient if the services are not sharing a database. Similarly, certain kinds of audits will be more easily satisfied by isolating data storage.
If I go with CQRS for order-service, My question is - who is supposed to consume payment events. command side or read side of order-service?
If your ordering domain dynamics need information from payments, then the command side of ordering will need a copy of the information from payments.
The payments information is an unlocked copy of the data - the authoritative copy of that information in payments may be changing even as we are updating orders.
Assuming you don't want to tightly couple ordering to the domain dynamics of payments, the copy of the payments information used by ordering will normally be a report (aka a "read model") rather than a copy of the entire history.

Related

Questions pertaining to micro-service architecture

I have a couple of questions that exist around micro service architecture, for example take the following services:
orders,
account,
communication &
management
Question 1: From what I read I understand that each service is suppose to have ownership of the data pertaining to that service, so orders would have an orders database. How important is that data ownership? Would micro-services make sense if they all called from one traditional database such that all data pertaining to the services would exist in one database? If so, are there an implications of structuring the services this way.
Question 2: Services should be able to communicate with one and other. How would that statement be any different than simply curling an existing API? & basing the logic on that response? Is calling a service more efficient than simply curling the API?
Question 3: Is it worth it? Now I understand this is a massive generality , and it's fundamentally predicated on the needs of the business. But when that discussion has been had, was the re-build worth it? & what challenges can you expect to face
I will try to answer all the questions.
Respect to all services using the same database. If you do so you have two main problems. First the database would become a bottleneck because all requests will go to the same point. And second you will have coupled all your services, so if the database goes down or it needs to update, all your services will be affected. (The database will became a single point of failure)
The communication between services could be whatever your services need (syncrhonous, asynchronous, via message passing (message broker), etc..) it all depends on the use cases you have to support. The recommended way to do to avoid temporal decoupling is to use a message broker like kafka, doing this your services don't have to known each other and in case some of them go down the others will still working. And when they are up again, they can continue to process the messages that have pending. However, if your services need to respond in synchronous way, you can define synchronous communication between services and use a circuit breaker to behave properly in case the callee service is down.
Microservices architecture is far more complicated to make it work, to monitoring and to debug than a traditional monolith architecture so, it is only worth if you will have very large requirements of scalability and availability and/or if the system is very large and it will require several teams working in different parts of the system and it is recommendable to avoid dependencies among them. So each team can work at their own pace deploying their own services

Learning DDD and CQRS

I'm new to DDD and CQRS and I'm planning to build a simple application to improve my skills a bit.
What I'm planning to do is a simple Taxi Corp application.
Requirements:
Client orders a taxi.
Client can have only one order at a time.
Driver picks an order.
Driver can have only one order at a time.
Driver goes to client.
Client enters cab.
Course starts.
Course finishes.
Client is purchased and driver is paid
And so on.
I can see there can be three aggregates: Client, Order and Driver. I want to split them into separate microservices. Do you think it's a good idea or I should start with one microservice?
I'm currently focused on the ordering a taxi. First of all I need to check if client doesn't already have a course assigned, later on I can create an order. After the order is created, I need to assign it to client. As during one request only one aggregate can be updated/created I wonder how to do it correctly. I've read something about Process Managers and I think it will be very useful in this case. I even draw a schema of communication. Can anyone tell me if my approach is correct and give me some tips on how to going further?
Process of creating an order
Do you think it's a good idea or I should start with one microservice?
I refer you to the wisdom of John Gall
A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over, beginning with a working simple system.
Instead of worrying about microservices, give your attention to messages.
Someone said: "If you have more microservices than customers, you are doing it wrong".
And if you really follow CQRS/ES approach, resulting system is much easier to split apart than traditional ORM monolyths.
So focus on the domain first and start with monolyth.
start with the microservices design even in a wrong way, you get a better insight into desired architecture. because problems in microservices architecture design show themselves very soon.
client and driver are both users of systems and have some commonalities so you can consider them as one domain and one micro-service for them.
consider an order manager micro-service to assign client and driver to a trip by their ids. the order database may include trips table with two id keys for driver-Id and client-Id and some columns for the different states. after finishing each trip you can remove it from the trip table and insert that in an archive table. also, you can leave it there and partition your table daily to keep your database performance high.
consider an accounting micro-service for keeping payments and transactions. It's ok if you opt to use NoSql databases for other microservices, but do use SQL database for your transactions.
you may need another microservice for reporting and dashboards. mirror other dbs in a new one for reporting.
you also need an API gateway to route requests to micro-services or do authentication
your process is a set of events. definitely, you will expand the system later on and perhaps will have some long-running tasks, better to have a message broker and implement your flow as an event/task flow using patterns like event sourcing.
I can see there can be three aggregates: Client, Order and Driver. I
want to split them into separate microservices. Do you think it's a
good idea or I should start with one microservice?
They all belong to the same bounded context. Bounded context translates nicely to microservices (see Eric Evans video: https://www.infoq.com/news/2015/06/dddx-microservices-boundaries). But don't start by designing a micro service, you are doing it in the wrong order. Design first your bounded context then if it makes sense create a micro service around the hexagonal architecture.
After the order is created, I need to assign it to client. As during
one request only one aggregate can be updated/created I wonder how to
do it correctly.
This is the perfect example of why you need to do it all in the same process.
But in the case you want to go multiple micro services, think of eventual consistency (https://en.wikipedia.org/wiki/Eventual_consistency) and create a message driven architecture between your services. Might be too much work in my opinion but for learning purpose can be a good idea.

Data access layer patterns using azure function

We are currently working on a design using Azure functions with Azure storage queue binding.
Each message in the queue represents a complete transaction. An Azure function will be bound to that queue so that the function will be triggered as soon as there is a new message in the queue.
The function will then commit the transaction in a SQL DB.
The first-cut implementation is also complete; and it's working fine. However, on retrospective, we are considering the following:
In a typical DAL, there are well-established design patterns using entity framework, repository patterns, etc. However, we didn't find a similar guidance/best practices when implementing DAL within a server-less code.
Therefore, my question is: should such patterns be implemented with Azure functions (this would be challenging :) ), or should the server-less code be kept as light as possible or this is not a use-case for azure functions, at all?
It doesn't take anything too special. We're using a routine set of library DLLs for all kinds of things -- database, interacting with other parts of Azure (like retrieving Key Vault secrets for connection strings), parsing file uploads, business rules, and so on. The libraries are targeting netstandard20 so we can more easily migrate to Functions v2 when the right triggers become available.
Mainly just design your libraries so they're highly modularized, so you can minimize how much you load to get the job done (assuming reuse in other areas of the system is important, which it usually is).
It would be easier if dependency injection was available today. See this for a few ways some of us have hacked it together until we get official DI support. (DI is on the roadmap for Functions, I believe the 3.0 release.)
At first I was a little worried about startup time with the library approach, but the underlying WebJobs stack itself is already pretty heavy, and Functions startup performance seems to vary wildly anyway (on the cheaper tiers, at least). During testing, one of our infrequently-executed Functions has varied from just ~300ms to a peak of about ~3800ms to parse the exact same test file, with all but ~55ms spent on startup).
should such patterns be implemented with Azure functions (this would
be challenging :) ), or should the server-less code be kept as light
as possible or this is not a use-case for azure functions, at all?
My answer is NO.
There should be patterns to follow, but the traditional repository patterns and CRUD operations do not seem to be valid in the cloud era.
Many strong concepts we were raised up to adhere to, became invalid these days.
Denormalizing the data base became something not only acceptable but preferable.
Now designing a pattern will depend on the database you selected for your solution and also depends of the type of your application and the type of your data.
This is a link for general guideline when you do Table Storage design Guidelines.
Is your application read-heavy or write-heavy ? The design will vary accordingly.
Are you using Azure Tables or Mongo? There are design decisions based on that. Indexing is important in Mongo while there is non in Azure table that you can do.
Sharding consideration.
Redundancy Consideration.
In modern development/Architecture many principles has changed, each Microservice has its own database that might be totally different that any other Microservices'.
If you read along the guidelines that I provided, you will see what I mean.
Designing your Table service solution to be read efficient:
Design for querying in read-heavy applications. When you are designing your tables, think about the queries (especially the latency sensitive ones) that you will execute before you think about how you will update your entities. This typically results in an efficient and performant solution.
Specify both PartitionKey and RowKey in your queries. Point queries such as these are the most efficient table service queries.
Consider storing duplicate copies of entities. Table storage is cheap so consider storing the same entity multiple times (with different keys) to enable more efficient queries.
Consider denormalizing your data. Table storage is cheap so consider denormalizing your data. For example, store summary entities so that queries for aggregate data only need to access a single entity.
Use compound key values. The only keys you have are PartitionKey and RowKey. For example, use compound key values to enable alternate keyed access paths to entities.
Use query projection. You can reduce the amount of data that you transfer over the network by using queries that select just the fields you need.
Designing your Table service solution to be write efficient:
Do not create hot partitions. Choose keys that enable you to spread your requests across multiple partitions at any point of time.
Avoid spikes in traffic. Smooth the traffic over a reasonable period of time and avoid spikes in traffic.
Don't necessarily create a separate table for each type of entity. When you require atomic transactions across entity types, you can store these multiple entity types in the same partition in the same table.
Consider the maximum throughput you must achieve. You must be aware of the scalability targets for the Table service and ensure that your design will not cause you to exceed them.
Another good source is this link:

Separation of concerns in Node.js app and dealing with load across different processes

I have a Node application which persists data to a MongoDB database. Most of this data is in hand, such as data for the User collection. However, the application also has the concept of Website collection, and for this collection, data must first be downloaded from somewhere before it is saved.
I am wondering how I should separate the above concerns in my application. At the service layer, I have things like User and Website. They provide basic CRUD operations. At completely the opposite end of the spectrum, there is a user interface whereby uses can input a website URL. Somewhere between this UI and the application persisting the data to MongoDB (the service layer), the application must make a request to this URL to gather some data. Once the data has been fetched, the Website service will persist it.
Potentially, there could be thousands of these URLs entered at once, and I do not want to bring down the Node process that handles the web server due to load issues. Therefore I think it would be a good idea to abstract the work out to a different process and use some sort of messaging bus to tie the application together.
It seems that you've decomposed system correctly -and have created that separation at the persistence "service" layer-, but I'd take this separation a bit further by moving toward a distributed system architecture (i.e. SOA / micro-services).
The initial step of building a distributed system is identifying each of the functions necessary to meet the overall business goal of the application and mapping these to service endpoints. Each loosely coupled service endpoint will then serve a small isolated job/function and it will act as an abstraction for that business goal.
By continuing the separation of responsibilities all the way to the service endpoint you create small independent boundaries for scalability, throughput, fault tolerance, security, deployment, etc.
For example -RESTfully speaking-, this might mean service endpoints for both Users (e.g. /users/{userid}) and Websites (e.g. /websites/{websiteid|url})... and perhaps an additional Resource to maintain the relationship/link between the two (e.g. /users/{userid}/userwebsites : {websiteid:1234,url:blah.com).
This separation would mean you can handle the website processing responsibility independently, which would have a number of benefits -beyond just handling the different load characteristics-.

Creating incremental reports using Azure Tables

I need to create incremental reports in the table storage. I need to be able to update the same records from several different worker role instances (different roles with several instances each).
My reports consist mainly of values that I need to increment after I parse the raw data I initially stored.
The optimistic solution I found is to use a retry mechanism: Try to update the record. If you get a 412 result code (you don't have the latest ETAG value), retry. This solution becomes less efficient and more costly the more users you have and the more data you need to update simultaneously (my case exactly).
Another solution that comes to mind is to have only one instance of one worker role that can possibly update any given record. This is very problematic because this means that I will by-design create bottlenecks in my architecture, which is the opposite of the scale I want to reach with Azure.
If anyone here has some best practices in mind for such a use case, I would love to hear it.
Most cloud storages (Table Storage is one of those) do not offer scalable writes on a single entity/blob/whatever. There is no quick-fix for this limitation, as this limitation comes from the core tradeoff that have being made to create cloud storage in the first place.
Basically, a storage unit (entity/blob/whatever) can be updated about once every 20ms, and that's about it. Having a dedicated worker or not will not change anything to this aspect.
Instead, you need to address your task from from a different angle. For counters, the most usual approach is the use of sharded counters (link is for GAE, but you can implement an equivalent behavior on Azure).
Also, another way to ease the pain to go for an asynchronous architecture ala CQRS where the performance constraints you put on the update latency of entities is significantly relaxed.
I believe the approach needs re-architecture. In order to ensure scalability and limit amount of contention, you want to make sure that every write can work optimistically by providing unique Table/PartitionKey/RowKey
If you need those values for reports to be merged together, have a separate process/worker that will post-aggregated/merge the records for reporting purposes. You can use a queue or a timing mechanism to start aggregation/merging

Resources