Applying CQRS to Inventory Management

Applying CQRS to Inventory Management - domain-driven-design

I am still trying to wrap my head around how to apply DDD and, most recently, CQRS to a real production business application. In my case, I am working on an inventory management system. It runs as a server-based application exposed via a REST API to several client applications. My focus has been on the domain layer with the API and clients to follow.
The command side of the domain is used to create a new Order and allows modifications, cancellation, marking an Order as fulfilled and shipped/completed. I, of course, have a query that returns a list of orders in the system (as read-only, lightweight DTOs) from the repository. Another query returns a PickList used by warehouse employees to pull items from the shelves to fulfill specific orders. In order to create the PickList, there are calculations, rules, etc that must be evaluated to determine which orders are ready to be fulfilled. For example, if all order line items are in stock. I need to read the same list of orders, iterate over the list and apply those rules and calculations to determine which items should be included in the PickList.
This is not a simple query, so how does it fit into the model?
UPDATE
While I may be able to maintain (store) a set of PickLists, they really are dynamic until an employee retrieves the next PickList. Consider the following scenario:
The first Order of the day is received. I can raise a domain event that triggers an AssemblePickListCommand which applies all of the rules and logic to create one or more PickLists for that Order.
A second Order is received. The event handler should now REPLACE the original PickLists with one or more new PickLists optimized across both pending Orders.
Likewise after a third Order is received.
Let's assume we now have two PickLists in the 'queue' because the optimization rules split the lists because components are at opposite ends of the warehouse.
Warehouse employee #1 requests a PickList. The first PickList is pulled and printed.
A fourth Order is received. As before, the handler removes the second PickList from the queue (the only one remaining) and regenerates one or more PickLists based on the second PickList and the new Order.
The PickList 'assembler' will repeat this logic whenever a new Order is received.
My issue with this is that a request must either block while the PickList queue is being updated or I have an eventual consistency issue that goes against the behavior the customer wants. Each time they request a PickList, they want it optimized based on all of the Order received to that point in time.

While I may be able to maintain (store) a set of PickLists, they really are dynamic until an employee retrieves the next PickList. Consider the following scenario:
The first Order of the day is received. I can raise a domain event that triggers an AssemblePickListCommand which applies all of the rules and logic to create one or more PickLists for that Order.
A second Order is received. The event handler should now REPLACE the original PickLists with one or more new PickLists optimized across both pending Orders.
This sounds to me like you are getting tangled trying to use a language that doesn't actually match the domain you are working in.
In particular, I don't believe that you would be having these modeling problems if the PickList "queue" was a real thing. I think instead there is an OrderItem collection that lives inside some aggregate, you issue commands to that aggregate to generate a PickList.
That is, I would expect a flow that looks like
onOrderPlaced(List<OrderItems> items)
warehouse.reserveItems(List<OrderItems> items)
// At this point, the items are copied into an unasssigned
// items collection. In other words, the aggregate knows
// that the items have been ordered, and are not currently
// assigned to any picklist
fire(ItemsReserved(items))
onPickListRequested(Id<Employee> employee)
warehouse.assignPickList(Id<Employee> employee, PickListOptimizier optimizer)
// PickListOptimizer is your calculation, rules, etc that know how
// to choose the right items to put into the next pick list from a
// a given collection of unassigned items. This is a stateless domain
// *domain service* -- it provides the query that the warehouse aggregate needs
// to figure out the right change to make, but it *doesn't* change
// the state of the aggregate -- that's the aggregate's responsibility
List<OrderItems> pickedItems = optimizer.chooseItems(this.unassignedItems);
this.unassignedItems.removeAll(pickedItems);
// This mockup assumes we can consider PickLists to be entities
// within the warehouse aggregate. You'd need some additional
// events if you wanted the PickList to have its own aggregate
Id<PickList> = PickList.createId(...);
this.pickLists.put(id, new PickList(id, employee, pickedItems))
fire(PickListAssigned(id, employee, pickedItems);
onPickListCompleted(Id<PickList> pickList)
warehouse.closePicklist(Id<PickList> pickList)
this.pickLists.remove(pickList)
fire(PickListClosed(pickList)
onPickListAbandoned(Id<PickList> pickList)
warehouse.reassign(Id<PickList> pickList)
PickList list = this.pickLists.remove(pickList)
this.unassignedItems.addAll(list.pickedItems)
fire(ItemsReassigned(list.pickedItems)
Not great languaging -- I don't speak warehouse. But it covers most of your points: each time a new PickList is generated, it's being built from the latest state of pending items in the warehouse.
There's some contention - you can't assign items to a pick list AND change the unassigned items at the same time. Those are two different writes to the same aggregate, and I don't think you are going to get around that as long as the client insists upon a perfectly optimized picklist each time. It might be worth while to sit down with the domain experts and explore the real cost to the business if the second best pick list is assigned from time to time. After all, there's already latency between the placing the order and its arrival at the warehouse....

I don't really see what your specific question is. But the first thing that comes to mind is that pick list creation is not just a query but a full blown business concept that should be explicitly modeled. It then could be created with AssemblePicklist command for instance.

You seem to have two roles/processes and possibly also two aggregate roots - salesperson works with orders, warehouse worker with picklists.
AssemblePicklistsCommand() is triggered from order processing and recreates all currently unassigned picklists.
Warehouse worker fires a AssignPicklistCommand(userid) which tries to choose the most appropriate unassigned picklist and assign it to him (or doing nothing if he already has an active picklist). He could then use GetActivePicklistQuery(userid) to get the picklist, pick items with PickPicklistItemCommand(picklistid, item, quantity) and finally MarkPicklistCompleteCommand() to signal order he's done.
AssemblePicklist and AssignPicklist should block each other (serial processing, optimistic concurency?) but the relation between AssignPicklist and GetActivePicklist is clean - either you have a picklist assigned or you don't.

Related

Events in Azure Search

Is there a way to attach webhooks or get events from Azure Search?
Specifically we are looking for way to get notified (programmatically) when an indexer completes indexing an index.

Currently, there are no such events. However, you can implement functionality like this yourself. There are several scenarios to consider. Basically, you have two main approaches to adding content. Either define a content source and use pull or use the API to push content to the index.
The simplest scenario would be when you are using push via the API to add a single item. You could create a wrapper method that both submits your item and then queries the index until that item is found. Your wrapper method would need to either call a callback or fire an event. To support updates on an item you would need a marker on the item, like a timestamp property that indicates the time when the item was submitted to the index. Or a version number or something that allows you to distinguish the new item from the old.
A more complex scenario is when you handle batches or volumes of content. Assuming you start from scratch and your corpus is 100.000 items, you could query until the count matches 100.000 items before you fire your event. To handle updates, the best approach is to use some marker. E.g. you submit a batch of 100 updates at 2020-18-08 09:58. You could then query the index, filtering by items that are updated after the timestamp you submitted your content. Once the count from your query matches 100 you can fire your event.
You would also need to handle indexing errors or exceptions when submitting content in these scenarios.
For pull-scenarios your best option is to define a skill that adds a timestamp to items. You could then poll the index with a query, filtering by content with a timestamp after the point indexing started and then fire your event.

DDD: Eventual consistency and 1 to n relationships

In my domain I have Product and Order aggregates. Orders reference Products. This is a 1 to n relationship, so a Product has many Orders and Orders belong to a Product. When a Product is discontinued a ProductDiscontinued event is published and all Orders that belong to that Product must be cancelled. So there's an adapter that receives the ProductDiscontinued event via RabbitMQ. The adapter then delegates cancelling Orders to an application service. How can I achieve that a single Order is cancelled in a single transaction? Should the adapter iterate all Orders of the discontinued Product and call the application service for every single Order? Should I just ignore that I modify more than one aggregate in a single transaction and call the application service just once with a list of all affected OrderIds? Is there a better solution?

From the DDD point of view, the Aggregate is the transaction boundary. The transaction should not be larger than the Aggregate. This rule exist in order to force one to correctly design the Aggregates, to not depend on multiple Aggregates modified in the same transaction.
However, you already designed your Aggregates having that in mind (from what I can see).
Should the adapter iterate all Orders of the discontinued Product and call the application service for every single Order?
This is the normal way of doing things.
Should I just ignore that I modify more than one aggregate in a single transaction and call the application service just once with a list of all affected OrderIds?
In the context of what I wrote earlier, you may do that if somehow it offers a better performance (I don't see how a bigger transaction can give better performance but hey, it depends on the code also).

Rebuild queries from domain events by multiple aggregates

I'm using a DDD/CQRS/ES approach and I have some questions about modeling my aggregate(s) and queries. As an example consider the following scenario:
A User can create a WorkItem, change its title and associate other users to it. A WorkItem has participants (associated users) and a participant can add Actions to a WorkItem. Participants can execute Actions.
Let's just assume that Users are already created and I only need userIds.
I have the following WorkItem commands:
CreateWorkItem
ChangeTitle
AddParticipant
AddAction
ExecuteAction
These commands must be idempotent, so I cant add twice the same user or action.
And the following query:
WorkItemDetails (all info for a work item)
Queries are updated by handlers that handle domain events raised by WorkItem aggregate(s) (after they're persisted in the EventStore). All these events contain the WorkItemId. I would like to be able to rebuild the queries on the fly, if needed, by loading all the relevant events and processing them in sequence. This is because my users usually won't access WorkItems created one year ago, so I don't need to have these queries processed. So when I fetch a query that doesn't exist, I could rebuild it and store it in a key/value store with a TTL.
Domain events have an aggregateId (used as the event streamId and shard key) and a sequenceId (used as the eventId within an event stream).
So my first attempt was to create a large Aggregate called WorkItem that had a collection of participants and a collection of actions. Participant and Actions are entities that live only within a WorkItem. A participant references a userId and an action references a participantId. They can have more information, but it's not relevant for this exercise. With this solution my large WorkItem aggregate can ensure that the commands are idempotent because I can validate that I don't add duplicate participants or actions, and if I want to rebuild the WorkItemDetails query, I just load/process all the events for a given WorkItemId.
This works fine because since I only have one aggregate, the WorkItemId can be the aggregateId, so when I rebuild the query I just load all events for a given WorkItemId.
However, this solution has the performance issues of a large Aggregate (why load all participants and actions to process a ChangeTitle command?).
So my next attempt is to have different aggregates, all with the same WorkItemId as a property but only the WorkItem aggregate has it as an aggregateId. This fixes the performance issues, I can update the query because all events contain the WorkItemId but now my problem is that I can't rebuild it from scratch because I don't know the aggregateIds for the other aggregates, so I can't load their event streams and process them. They have a WorkItemId property but that's not their real aggregateId. Also I can't guarantee that I process events sequentially, because each aggregate will have its own event stream, but I'm not sure if that's a real problem.
Another solution I can think of is to have a dedicated event stream to consolidate all WorkItem events raised by the multiple aggregates. So I could have event handlers that simply append the events fired by the Participant and Actions to an event stream whose id would be something like "{workItemId}:allevents". This would be used only to rebuild the WorkItemDetails query. This sounds like an hack.. basically I'm creating an "aggregate" that has no business operations.
What other solutions do I have? Is it uncommon to rebuild queries on the fly? Can it be done when events for multiple aggregates (multiple event streams) are used to build the same query? I've searched for this scenario and haven't found anything useful. I feel like I'm missing something that should be very obvious, but I haven't figured what.
Any help on this is very much appreciated.
Thanks

I don't think you should design your aggregates with querying concerns in mind. The Read side is here for that.
On the domain side, focus on consistency concerns (how small can the aggregate be and the domain still remain consistent in a single transaction), concurrency (how big can it be and not suffer concurrent access problems / race conditions ?) and performance (would we load thousands of objects in memory just to perform a simple command ? -- exactly what you were asking).
I don't see anything wrong with on-demand read models. It's basically the same as reading from a live stream, except you re-create the stream when you need it. However, this might be quite a lot of work for not an extraordinary gain, because most of the time, entities are queried just after they are modified. If on-demand becomes "basically every time the entity changes", you might as well subscribe to live changes. As for "old" views, the definition of "old" is that they are not modified any more, so they don't need to be recalculated anyways, regardless of if you have an on-demand or continuous system.
If you go the multiple small aggregates route and your Read Model needs information from several sources to update itself, you have a couple of options :
Enrich emitted events with additional data
Read from multiple event streams and consolidate their data to build the read model. No magic here, the Read side needs to know which aggregates are involved in a particular projection. You could also query other Read Models if you know they are up-to-date and will give you just the data you need.
See CQRS events do not contain details needed for updating read model

Processing a stream in Node where action depends on asynchronous calls

I am trying to write a node program that takes a stream of data (using xml-stream) and consolidates it and writes it to a database (using mongoose). I am having problems figuring out how to do the consolidation, since the data may not have hit the database by the time I am processing the next record. I am trying to do something like:
on order data being read from stream
look to see if customer exists on mongodb collection
if customer exists
add the order to the document
else
create the customer record with just this order
save the customer
My problem is that two 'nearby' orders for a customer cause duplicate customer records to be written, since the first one hasn't been written before the second one checks to see if it there.
In theory I think I could get around the problem by pausing the xml-stream, but there is a bug preventing me from doing this.

Not sure that this is the best option, but using async queue was what I ended up doing.
At the same time as I was doing that a pull request for xml-stream (which is what I was using to process the stream) that allowed pausing was added.

Is there a unique field on the customer object in the data coming from the stream? You could add a unique restriction to your mongoose schema to prevent duplicates at the database level.
When creating new customers, add some fallback logic to handle the case where you try to create a customer but that same customer is created by another save at the same. When this happens try the save again but first fetch the other customer first and add the order to the fetched customer document

In DDD, a UoW per Repository or Bounded Context or Transaction?

In DDD, an aggregate root can have a repository. Let us take an Order aggregate and it's non-persistant counterpart OrderRepository and persistent counterpart OrderUoW. We have also ProductVariant aggregate which tracks the inventory of the products in the order. It can have a ProductVariantRepository and ProductVariantUoW.
The way the Order and the ProductVariant work is that before the order is persisted, the inventory is checked. If there is inventory, the order will be persisted by calling OrderUoW.Commit(). Yes, the ProductVariantUoW.Commit() will be called next to update the inventory of the products.
UNFORTUNATELY things can go bad, a user bought the same products in that short time (Consider this as a web app where two users are buying the same products). Now the whole transaction for the second user should fail by reverting the order that just created. Should I call the OrderUoW to rollback the changes (the order should be deleted from the db)? Or should I put both UoW.Commit() operations in a transaction scope, so failing of one commit() will rollback the changes? Or both the repositories (Order, ProductVariant) should have only UoW and it needs to have only one transaction scope?
I may be able to make the story short by saying, how the transaction is handled where there are multiple repositories involved?

A question we could ask is who is doing the following:
The way the Order and the ProductVariant work is that before the order
is persisted, the inventory is checked. If there is inventory, the
order will be persisted by calling OrderUoW.Commit(). Yes, the
ProductVariantUoW.Commit() will be called next to update the inventory
of the products.
Some argue that this kind of work belongs in the service layer, which allows the service layer to put things crossing aggregate objects into a single transaction.
According to http://www.infoq.com/articles/ddd-in-practice:
Some developers prefer managing the transactions in the DAO classes
which is a poor design. This results in too fine-grained transaction
control which doesn't give the flexibility of managing the use cases
where the transactions span multiple domain objects. Service classes
should handle transactions; this way even if the transaction spans
multiple domain objects, the service class can manage the transaction
since in most of the use cases the Service class handles the control
flow.
I think as an alternative to using a single transaction, you can claim the inventory using ProductVariant, and, if all the inventory items necessary are available then you can commit the order. Otherwise (i.e. you can't claim all the products you need for the order) you have to return the inventory that was successfully claimed using compensating transactions. The results it that in the case of unsuccessfull commit of an order, some of the inventory will temporarily appear unavailable for other orders, but the advantage is that you can work without a distributed transaction.
None the less, this logic still belongs in the service layer, not the DAO classes.

The way you are using unit of work seems a bit fine-grained. Just in case you haven't read Martin Fowler's take: http://martinfowler.com/eaaCatalog/unitOfWork.html
That being said you want to handle the transaction at the use-case level. The fact that the inventory is checked up-front is simply a convenience (UX) and the stock level should be checked when persisting the various bits also. An exception can be raised for insufficient stock.
The transaction isolation level should be set such that the two 'simultaneous' parts are performed serially. So whichever one gets to update the stock levels first is going to 'win'. The second will then raise the exception.

If you can use a single UoW then do so, because it's easier.
If your repositories are on different DBs (or maybe one is file-based and the others are not) then you may be forced to use multiple UoWs, but then you're writing roll-back commands too, because if UoW1 saves changes to SqlRepo OK, but then UoW2 fails to save changes to FileRepo then you need to rollback SqlRepo. Don't bother writing all that rollback command stuff if you can avoid it!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string