How to perform validation across services in microservices - domain-driven-design

Suppose there are two microservices: Order and Inventory. There is an API in order service that takes ProductId, Qty etc and place the order.
Ideally order should only be allowed to place if inventory exists in inventory service. People recommend to have Saga pattern or any other distributed transactions. That is fine and eventually consistency will be utilized.
But what if somebody wants to abuse the system. He can push orders with products (ProductIds) which are either invalid or out of inventory. System will be taking all these orders and place these orders in queue and Inventory service will be handling these invalid order.
Shouldn't this be handled upfront (in order service) rather than pushing these invalid orders to the next level (specially where productId is invalid)
What are the recommendations to handle these scenarios?

What are the recommendations to handle these scenarios?
Give your order service access to the data that it needs to filter out undesirable orders.
The basic plot would be that, while the Inventory service is the authority for the state of inventory, your Orders service can work with a cached copy of the inventory to determine which orders to accept.
Changes to the Inventory are eventually replicated into the cache of the Orders service -- that's your "eventual consistency". If Inventory drops off line for a time, Order's can continue providing business value based on the information in its cache.
You may want to be paying attention to the age in the data in the cache as well -- if too much time has passed since the cache was last updated, then you may want to change strategies.
Your "aggregates" won't usually know that they are dealing with a cache; you'll pass along with the order data a domain service that supports the queries that the aggregate needs to do its work; the implementation of the domain service accesses the cache to provide answers.
So long as you don't allow the abuser to provide his own instance of the domain service, or to directly manipulate the cache, then the integrity of the cached data is ensured.
(For example: when you are testing the aggregate, you will likely be providing cached data tuned to your specific test scenario; that sort of hijacking is not something you want the abuser to be able to achieve in your production environment).

You most definitely would want to ensure up-front that you can catch as many invalid business cases as possible. There are a couple ways to deal with this. It is the same situation as one would have when booking a seat on an airline. Although they do over-booking which we'll ignore for now :)
Option 1: You could reserve an inventory item as part of the order. This is more of a pessimistic approach but your item would be reserved while you wait for the to be confirmed.
Option 2: You could accept the order only if there is an inventory item available but not reserve it and hope it is available later.
You could also create a back-order if the inventory item isn't available and you want to support back-orders.
If you go with option 1 you could miss out on a customer if an item has been reserved for customer A and customer B comes along and cannot order. If customer A decides not to complete the order the inventory item becomes available again but customer B has now gone off somewhere else to try and source the item.
As part of the fulfillment of your order you have to inform the inventory bounded context that you are now taking the item. However, you may now find that both customer A and B have accepted their quote and created an order for the last item. One is going to lose out. At this point the one not able to be fulfilled will send a mail to the customer and inform them of the unfortunate situation and perhaps create a back-order; or ask the customer to try again in X-number of days.
Your domain experts should make the call as to how to handle the scenarios and it all depends on item popularity, etc.

I will not try to convince you to not do this checking before placing an order and to rely on Sagas as it is usually done; I will consider that this is a business requirement that you must implement.
This seems like a new sub-domain to me: bad-behavior-prevention (or how do you want to call it) that comes with a new responsibility: to prevent abusers. You could add this responsibility to the Order microservice but you would break the SRP. So, it should be done in another microservice.
This new microservice is called from your API Gateway (if you have one) or from the Orders microservice.
If you do not to add a new microservice (from different reasons) then you could implement this new functionality as a module inside of the Orders microservice but I strongly recommend to make it highly decoupled from its host (separate and private persistence/database/table).

Related

CQRS Aggregate and Projection consistency

Aggregate can use View this fact is described in Vaughn Vernon's book:
Such Read Model Projections are frequently used to expose information to various clients (such as desktop and Web user interfaces), but they are also quite useful for sharing information between Bounded Contexts and their Aggregates. Consider the scenario where an Invoice Aggregate needs some Customer information (for example, name, billing address, and tax ID) in order to calculate and prepare a proper Invoice. We can capture this information in an easy-to-consume form via CustomerBillingProjection, which will create and maintain an exclusive instance of CustomerBilling-View. This Read Model is available to the Invoice Aggregate through the Domain Service named IProvideCustomerBillingInformation. Under the covers this Domain Service just queries the document store for the appropriate instance of the CustomerBillingView
Let's imagine our application should allow to create many users, but with unique names. Commands/Events flow:
CreateUser{Alice} command sent
UserAggregate checks UsersListView, since there are no users with name Alice, aggregate decides to create user and publish event.
UserCreated{Alice} event published // By UserAggregate
UsersListProjection processed UserCreated{Alice} // for simplicity let's think UsersListProjection just accumulates users names if receives UserCreated event.
CreateUser{Bob} command sent
UserAggregate checks UsersListView, since there are no users with name Bob, aggregate decides to create user and publish event.
UserCreated{Bob} event published // By UserAggregate
CreateUser{Bob} command sent
UserAggregate checks UsersListView, since there are no users with name Bob, aggregate decides to create user and publish event.
UsersListProjection processed UserCreated{Bob} .
UsersListProjection processed UserCreated{Bob} .
The problem is - UsersListProjection did not have time to process event and contains irrelevant data, aggregate used this irrelevant data. As result - 2 users with the same name created.
how to avoid such situations?
how to make aggregates and projections consistent?
how to make aggregates and projections consistent?
In the common case, we don't. Projections are consistent with the aggregate at some time in the past, but do not necessarily have all of the latest updates. That's part of the point: we give up "immediate consistency" in exchange for other (higher leverage) benefits.
The duplication that you refer to is usually solved a different way: by using conditional writes to the book of record.
In your example, we would normally design the system so that the second attempt to write Bob to our data store would fail because conflict. Also, we prevent duplicates from propagating by ensuring that the write to the data store happens-before any events are made visible.
What this gives us, in effect, is a "first writer wins" write strategy. The writer that loses the data race has to retry/fail/etc.
(As a rule, this depends on the idea that both attempts to create Bob write that information to the same place, using the same locks.)
A common design to reduce the probability of conflict is to NOT use the "read model" of the aggregate itself, but to instead use its own data in the data store. That doesn't necessarily eliminate all data races, but you reduce the width of the window.
Finally, we fall back on Memories, Guesses and Apologies.
It's important to remember in CQRS that every write model is also a read model for the reads that are required to validate a command. Those reads are:
checking for the existence of an aggregate with a particular ID
loading the latest version of an entire aggregate
In general a CQRS/ES implementation will provide that read model for you. The particulars of how that's implemented will depend on the implementation.
Those are the only reads a command-handler ever needs to perform, and if a query can be answered with no more than those reads, the query can be expressed as a command (e.g. GetUserByName{Alice}) which when handled does not emit events. The benefit of such read-only commands is that they can be strongly consistent because they are limited to a single aggregate. Not all queries, of course, can be expressed this way, and if the query can tolerate eventual consistency, it may not be worth paying the coordination tax for strong consistency that you typically pay by making it a read-only command. (Command handling limited to a single aggregate is generally strongly consistent, but there are cases, e.g. when the events form a CRDT and an aggregate can live in multiple datacenters where even that consistency is loosened).
So with that in mind:
CreateUser{Alice} received
user Alice does not exist
persist UserCreated{Alice}
CreateUser{Alice} acknowledged (e.g. HTTP 200, ack to *MQ, Kafka offset commit)
UserListProjection updated from UserCreated{Alice}
CreateUser{Bob} received
user Bob does not exist
persist UserCreated{Bob}
CreateUser{Bob} acknowledged
CreateUser{Bob} received
user Bob already exists
command-handler for an existing user rejects the command and persists no events (it may log that an attempt to create a duplicate user was made)
CreateUser{Bob} ack'd with failure (e.g. HTTP 401, ack to *MQ, Kafka offset commit)
UserListProjection updated from UserCreated{Bob}
Note that while the UserListProjection can answer the question "does this user exist?", the fact that the write-side can also (and more consistently) answer that question does not in and of itself make that projection superfluous. UserListProjection can also answer questions like "who are all of the users?" or "which users have two consecutive vowels in their name?" which the write-side cannot answer.

CQRS - applying command based on decision from multiple projections

Question is related to CQRS - I have user that wants to order something from web and is presented with GUI showing his balance = 100$ and stock = 1 item. Let's say we have 2 services here AccountService and StockService with separate concerns. In order to generate GUI for client third service AggregatorService listens to domain events from AccountService and StockService, projects a view and creates GUI for clients.
When user decides to order this item, he clicks a button and Command for order is sent to AccountService. Here we load AccountAggregate in order to decrease balance for the price of the item that needs to be ordered. But before I can do this, I have to check if the item is still available (or somehow to reserve it). Only thing that comes up to my mind is:
Read current stock of the item from read model of StockService, but what can happen is that other services read model is updated just a second after I read it (e.g. somebody bought the item, so actual stock is =0. but read model still has =1).
Before decreasing a balance call some method on StockService to reserve the item for some time. If order is not successful (e.g. no enough funds on balance, I would have to un-reserve it somehow). This needs to be some sync-REST call and it is probably slower than some async solution (if any).
Are there any best practices for this kind of use-case?
You have 2 options, depending on whether you embrace eventual consistency or not.
Using immediate consistency I would have an OrderService that receives the order request and it makes async calls to AccountService.ReservePayment() and StockService.ReserveStock(). If either of those fail you call AccountService.UndoReservePayment() and StockService.UndoReserveStock(). If both succeed you call AccountService.CompleteReservePayment() and StockService.CompleteReserveStock(). Make sure each reservation should have a timestamp on it so a daemon process can occasionally run and Undo any reserves that are older than an hour or so.
This approach makes the user wait until both those services complete. If either the StockService or the AccountService are slow, the user experience is slow. I suggest a better way to do this is the eventual consistency approach which gives the user a very fast experience at the expense of receiving failure messages after the fact.
With eventual consistency you assume they have enough credit and you have enough inventory, and in response to their order request you immediately send back a success message. The user is happy and they go along to buy more stuff.
The OrderCreated event is then handled asynchronously to reserve stock and credit as described above. However, since there is no time pressure to reply to the waiting user you don’t have to scale up to handle as high a throughput. If the credit check and stock check take a minute or two, the user doesn’t care because he’s off doing other things.
The price you pay here is that the user may get a success message at the time of order submission, then a few minutes later get an email saying the sale didn’t go through after all because there’s no stock. This is what many large retailers do, including Amazon, Target, Walmart, etc. Eventual consistency can go a long way towards easing the load and cost of the back end.

DDD - How to modify several AR (from different bounded contexts) throughout single request?

I would want expose a little scenario which is still at paper state, and which, regarding DDD principle seem a bit tedious to accomplish.
Let's say, I've an application for hosting accounts management. Basically, the application compose several bounded contexts such as Web accounts management, Ftp accounts management, Mail accounts management... each of them represented by their own AR (they can live standalone).
Now, let's imagine I want to provide a UI with an HTML form that compose one fieldset for each bounded context, for instance to update limits and or features. How should I process exactly to update all AR without breaking single transaction per request principle? Can I create a kind of "outer" AR, let's say a ClientHostingProperties AR which would holds references to other AR and update them as part of single transaction, using own repository? Or should I better create an AR that emit messages to let's listeners provided by the bounded contexts react on, in which case, I should probably think about ES?
Thanks.
How should I process exactly to update all AR without breaking single transaction per request principle?
You are probably looking for a process manager.
Basic sketch: persisting the details from the submitted form is a transaction unto itself (you are offered an opportunity to accrue business value; step 1 is to capture that opportunity).
That gives you a way to keep track of whether or not this task is "done": you compare the changes in the task to the state of the system, and fire off commands (to run in isolated transactions) to make changes.
Processes, in my mind, end up looking a lot like state machines. These tasks are commands are done, these commands are not done, these commands have failed: now what? and eventually reach a state where there are no additional changes to be made, and this instance of the process is "done".
Short answer: You don't.
An aggregate is a transactional boundary, which means that if you would update multiple aggregates in one "action", you'd have to use multiple transactions. The reason for an aggregate to be equivalent to one transaction is that this allows you to guarantee consistency.
This means that you have two options:
You can make your aggregate larger. Then you can actually guarantee consistency, but your ability to handle concurrent requests gets worse. So this is usually what you want to avoid.
You can live with the fact that it's two transactions, which means you are eventually consistent. If so, you usually use something such as a process manager or a flow to handle updating multiple aggregates. In its simplest form, a flow is nothing but a simple if this event happens, run that command rule. In its more complex form, it has its own state.
Hope this helps 😊

DDD how to model time tracking?

I am developing an application that has employee time tracking module. When employee starts working (e.g. at some abstract machine), we need to save information about him working. Each day lots of employees work at lots of machines and they switch between them. When they start working, they notify the system that they have started working. When they finish working - they notify the system about it as well.
I have an aggregate Machine and an aggregate Employee. These two are aggregate roots with their own behavior. Now I need a way to build reports for any given Employee or any given Machine for any given period of time. For example, I want to see which machines did given employee used over period of time and for how long. Or I want to see which employees worked at this given machine for how long over period of time.
Ideally (I think) my aggregate Machine should have methods startWorking(Employee employee) and finishWorking(Employee employee).
I created another aggregate: EmployeeWorkTime that stores information about Machine, Employee and start,finish timestamps. Now I need a way to modify one aggregate and create another at the same time (or ideally some another approach since this way it's somewhat difficult).
Also, employees have a Shift that describes for how many hours a day they must work. The information from a Shift should be saved in EmployeeWorkTime aggregate in order to be consistent in a case when Shift has been changed for given Employee.
Rephrased question
I have a Machine, I have an Employee. HOW the heck can I save information:
This Employee worked at this Machine from 1.05.2017 15:00 to 1.05.1017 18:31.
I could do this simply using CRUD, saving multiple aggregates in one transaction, going database-first. But I want to use DDD methods to be able to manage complexity since the overall domain is pretty complex.
From what I understand about your domain you must model the process of an Employee working on a machine. You can implement this using a Process manager/Saga. Let's name it EmployeeWorkingOnAMachineSaga. It work like that (using CQRS, you can adapt to other architectures):
When an employee wants to start working on a machine the EmployeeAggregate receive the command StartWorkingOnAMachine.
The EmployeeAggregate checks that the employee is not working on another machine and if no it raises the EmployeeWantsToWorkOnAMachine and change the status of the employee as wantingToWorkOnAMachine.
This event is caught by the EmployeeWorkingOnAMachineSaga that loads the MachineAggregate from the repository and it sends the command TryToUseThisMachine; if the machine is not vacant then it rejects the command and the saga sends the RejectWorkingOnTheMachine command to the EmployeeAggregate which in turns change it's internal status (by raising an event of course)
if the machine is vacant, it changes its internal status as occupiedByAnEmployee (by raising an event)
and similar when the worker stops working on the machine.
Now I need a way to build reports for any given Employee or any given Machine for any given period of time. For example, I want to see which machines did given employee used over period of time and for how long. Or I want to see which employees worked at this given machine for how long over period of time.
This should be implemented by read-models that just listen to the relevant events and build the reports that you need.
Also, employees have a Shift that describes for how many hours a day they must work. The information from a Shift should be saved in EmployeeWorkTime aggregate in order to be consistent in a case when Shift has been changed for given Employee
Depending on how you want the system to behave you can implement it using a Saga (if you want the system to do something if the employee works more or less than it should) or as a read-model/report if you just want to see the employees that do not conform to their daily shift.
I am developing an application that has employee time tracking module. When employee starts working (e.g. at some abstract machine), we need to save information about him working. Each day lots of employees work at lots of machines and they switch between them. When they start working, they notify the system that they have started working. When they finish working - they notify the system about it as well.
A critical thing to notice here is that the activity you are tracking is happening in the real world. Your model is not the book of record; the world is.
Employee and Machine are real world things, so they probably aren't aggregates. TimeSheet and ServiceLog might be; these are the aggregates (documents) that you are building by observing the activity in the real world.
If event sourcing is applicable there, how can I store domain events efficiently to build reports faster? Should each important domain event be its own aggregate?
Fundamentally, yes -- your event stream is going to be the activity that you observe. Technically, you could call it an aggregate, but its a pretty anemic one; easier to just think of it as a database, or a log.
In this case, it's probably just full of events like
TaskStarted {badgeId, machineId, time}
TaskFinished {badgeId, machineId, time}
Having recorded these events, you forward them to the domain model. For instance, you would take all of the events with Bob's badgeId and dispatch them to his Timesheet, which starts trying to work out how long he was at each work station.
Given that Machine and Employee are aggregate roots (they have their own invariants and business logic in a complex net of interrelations, timeshift-feature is only one of the modules)
You are likely to get yourself into trouble if you assume that your digital model controls a real world entity. Digital shopping carts and real world shopping carts are not the same thing; the domain model running on my phone can't throw things out of my physical cart when I exceed my budget. It can only signal that, based on the information that it has, the contents are not in compliance with my budgeting policy. Truth, and the book of record are the real world.
Greg Young discusses this in his talk at DDDEU 2016.
You can also review the Cargo DDD Sample; in particular, pay careful attention to the distinction between Cargo and HandlingHistory.
Aggregates are information resources; they are documents with internal consistency rules.

How are consistency violations handled in event sourcing?

First of all, let me state that I am new to Command Query Responsibility Segregation and Event Sourcing (Message-Drive Architecture), but I'm already seeing some significant design benefits. However, there are still a few issues on which I'm unclear.
Say I have a Customer class (an aggregate root) that contains a property called postalAddress (an instance of the Address class, which is a value object). I also have an Order class (another aggregate root) that contains (among OrderItem objects and other things) a property called deliveryAddress (also an instance of the Address class) and a string property called status.
The customer places an order by issueing a PlaceOrder command, which triggers the OrderReceived event. At this point in time, the status of the order is "RECEIVED". When the order is shipped, someone in the warehouse issues an ShipOrder command, which triggers the OrderShipped event. At this point in time, the status of the order is "SHIPPED".
One of the business rules is that if a Customer updates their postalAddress before an order is shipped (i.e., while the status is still "RECEIVED"), the deliveryAddress of the Order object should also be updated. If the status of the Order were already "SHIPPED", the deliveryAddress would not be updated.
Question 1. Is the best place to put this "conditionally cascading address update" in a Saga (a.k.a., Process Manager)? I assume so, given that it is translating an event ("The customer just updated their postal address...") to a command ("... so update the delivery address of order 123").
Question 2. If a Saga is the right tool for the job, how does it identify the orders that belong to the user, given that an aggregate can only be retrieved by it's unique ID (in my case a UUID)?
Continuing on, given that each aggregate represents a transactional boundary, if the system were to crash after the Customer's postalAddress was updated (the CustomerAddressUpdated event being persisted to the event store) but before the OrderDeliveryAddressUpdated could be updated (i.e., between the two transactions), then the system is left in an inconsistent state.
Question 3. How are such "violations" of consistency rules detected and rectified?
In most instances the delivery address of an order should be independent of any other data change as a customer may want he order sent to an arbitrary address. That being said, I'll give my 2c on how you could approach this:
Is the best place to handle this in a process manager?
Yes. You should have an OrderProcess.
How would one get hold of the correct OrderProcess instance given that it can only be retrieve by aggregate id?
There is nothing preventing one from adding any additional lookup mechanism that associates data to an aggregate id. In my experimental, going-live-soon, mechanism called shuttle-recall I have a IKeyStore mechanism that associates any arbitrary key to an AR Id. So you would be able to associate something like [order-process]:customerId=CID-123; as a key to some aggregate.
How are such "violations" of consistency rules detected and rectified?
In most cases they could be handled out-of-band, if possible. Should I order something from Amazon and I attempt to change my address after the order has shipped the order is still going to the original address. If your case of linking the customer postal address to the active order address you could notify the customer that n number of orders have had their addresses updated but that a recent order (within some tolerance) has not.
As for the system going down before processing you should have some guaranteed delivery mechanism to handle this. I do not regard these domain event in the same way I regard system events in a messaging infrastructure such as a service bus.
Just some thoughts :)

Resources