Projections in an event sourced system

Projections in an event sourced system - domain-driven-design

In an event sourced system, I have an aggregate root which is of type Order. Let's assume the following events is taking place:
OrderPlaced (orderId, placedAt, customerId, orderLines) where OrderLine (lineId, productId, price)
OrderAccepted (orderId)
And let's assume we need two different projections:
An projection holding the total price for all accepted orders grouped by year for each customer. Something like this:
OrdersByCustomer(customerId, summationOnAcceptedOrdersByYear) where SummationOnAcceptedOrdersByYear(year, sum)
The issue here is that OrderAccepted doesn't contain customerId. So when the projection receives an OrderAccepted it has no way of getting the current projection state, as the customerId is the documentId. Worth noting is that I'm storing the projections in Scylla - which is only queryable by the partition key - and the projection state is just a json representation. So not queryable by anything other then the documentId/projectionId. Maybe this is not an ideal choice of technology for projections...?
I'm thinking I have two options if going forward with Scylla:
Either pollute OrderAccepted with customerId. But I don't feel this is a good approach - then I would need to incorporate that into all events which would be related to an projection where the projectionId / documentId is not the same as the aggregateId.
Or have a separate table which contains a mapping between orderId and customerId, so the projection could query for customerId - this table would probably need to be updated in the command handler of the Order aggregate.
Alternatively we could have a projection for CustomerIdByOrderId - but here I think the projections can be in different positions of the event stream - this might be causing issues as well.
An projection which sums up the price of all accepted orders for each product. Something like this
SummationForProducts (productId, orderSummation)
So here we have an projection which relies on both OrderPlaced and OrderAccepted. And the thing is that since an OrderPlaced may contain multiple orderLines, thus spanning multiple projections - we would need to update multiple projections when receiving OrderPlaced. Is this normal in Event Sourcing Projections - to update multiple projections pr event?
And the same issue arises here as to OrderAccepted don't include productIds from OrderPlaced event. So here we could probably do with an similar approach to have a table which contains a mapping between orderId and productIds
I wonder how more experienced event sourcerers solve these things...? :) Any input on this is highly appreciated.

The beauty with event sourced systems is that all the information you need is - or at least should be - encapsulated in the events.
I'm going to assume you are using the CQRS pattern, and from your description the problem is centered around what the projection(s) should look like. In other words, you should not really need to change anything on the command/event side, but rather focus on how to get the data you need from your events to build up the appropriate projections.
I'm not familiar with Scylla, but you are free to build up your projections whichever way works for you. Remember that a projection is just data that you can query built up from events.
The simplest thing I can imagine is to just store each order, including the status. Something like this (in terrible pseudo code):
-- Order Summary event handler
on(orderPlaced):
insert into Order(orderId, datePlaced, customerId, orderLines, status)
values (orderPlaced.orderId, orderPlaced.date, orderPlaced.customerId, orderLines, 'placed')
on(orderAccepted):
update Order
set status = 'accepted'
where orderId = orderAccepted.orderId
-- Sum of accepted orders per customer and year
select customerId, year, sum(orderLines.price) from Orders
where status = 'accepted'
group by customerId, year(datePlaced)
-- Sum per product of accepted orders
select productId, sum(orderLines.price) from Orders
where status = 'accepted'
group by orderLines.productId, year(datePlaced)
Note that the above code assumes events are processed in order.
we would need to update multiple projections when receiving
OrderPlaced. Is this normal in Event Sourcing Projections - to update
multiple projections pr event?
Sure, it's perfectly normal. You can use a single event to update multiple projections or multiple events to update a single projection, or a combination of both.

Related

Reuse same database tables in different repositories (repositories overlap on the data they access)

Suppose I have database tables Customer, Order, Item. I have OrderRepository that accesses, directly with SQL/my ORM, both the Order and Items table. E.g. I could have a method, getItems on the OrderRespositry that returns all items of that order.
Suppose I now also create ItemRepository. Given I now have 2 repositories accessing the same database table, is that generally considered poor design? My thinking is, sometimes a user wants to update the details about an Item (e.g. name), but when using the OrdersRepository, it doesn't really make sense to not be able to access the items directly (you want to know about all the items in an order)
Of course, the OrderRepository could internally create* an ItemRepository and call methods like getItemsById(ids: string[]). However, consider the case that I want to get all orders and items ever purchased by a Customer. Assuming you had the orderIds for a customer, you could have a getOrders(ids: string[]) on the OrderRepository to fetch all the orders and then do a second query to fetch all the Items. I feel you make your life harder (and less efficient) in the sense you have to do the join to match items with orders in the app code rather than doing a join in SQL.
If it's not considered bad practice, is there some kind of limit to how much overlap Repositories should have with each other. I've spent a while trying to search for this on the web, but it seems all the tutorials/blogs/vdieos really don't go further than 1 table per entity (which may be an anti-pattern).
Or am I missing a trick?
Thanks
FYI: using express with TypeScript (not C#)
is a repository creating another repository considered acceptable. shouldn't only the service layer do that?

It's difficult to separate the Database Model from the DDD design but you have to.
In your example:
GetItems should have this signature - OrderRepostiory.GetItems(Ids: int[]) : ItemEntity. Note that this method returns an Entity (not a DAO from your ORM). To get the ItemEntity, the method might pull information from several DAOs (tables, through your ORM) but it should only pull what it needs for the entity's hydration.
Say you want to update an item's name using the ItemRepository, your signature for that could look like ItemRepository.rename(Id: int, name: string) : void. When this method does it's work, it could change the same table as the GetItems above but note that it could also change other tables as well (For example, it could add an audit of the change to an AuditTable).
DDD gives you the ability to use different tables for different Contexts if you want. It gives you enough flexibility to make really bold choices when it comes the infrastructure that surrounds your domain. So ultimately, it's a matter of what makes sense for your specific situation and team. Some teams would apply CQRS and the GETOrder and Rename methods will look completely different under the covers.

Back-filling a feed?

Is there a way to insert activities into a feed so they appear as if they were inserted at a specific time in the past? I had assumed that when adding items to a feed it would use the 'time' value to sort the results, even when propagated to other feeds following the initial feed, but it seems that's not the case and they just get sorted by the order they were added to the feed.
I'm working on a timeline view for our users, and I have a couple of reasons for wanting to insert activities at previous points in time:
1) We have a large number of entities in our database but a relatively small number of them will be followed (especially at first), so to be more efficient I had planned to only add activities for an entity once it had at least one follower. Once somebody follows it I would like to go back 14 days and insert activities for that entity as if they were created at the time they occurred, so the new follower would see them in their feed at the appropriate place. Currently they will just see a huge group of activities from the past at the top of their feed which is not useful.
2) Similarly, we already have certain following relationships within our database and at launch I would like to go back a certain amount of time and insert activities for all entities that already have followers so that the feed is immediately useful.
Is there any way to do this, or am I out of luck?
My feeds are a combination of flat and aggregated feeds - the main timeline for a user is aggregated, but most entity feeds are flat. All of my aggregation groups would be based on the time of the activity so ideally there would be a way to sort the final aggregation groups by time as well.

Feeds on Stream are sorted differently depending on their type:
Flat feeds are sorted based by activity time descending
Aggregated feeds and Notification feeds sort activity groups based on last-updated (activities inside groups are sorted by time descending)
This means that you can back-fill flat feeds but not aggregated feeds.
One possible way to get something similar to what you describe is to create follow relationship with copy_limit set to a low number so that only the most recent activities are propagated to followers.

create unique IDs for DynamoDB items

I have question about DynamoDB, or rather how to model a table.
Problem description:
Goal: Users can save price alerts for products.
For example: A user wants to save an alert for when the price for product x is less than a target price.
What I want to persist specifically are: product, userId, targetPrice, operator.
operator could be equal, less or greater (I would do the validation of these values in a step before persisting).
A user can add multiple alerts for the same product where the targetPrice and/or the operator would differ. If all of those attributes are the same then it should not create a duplicate item in the db.
And the alerts should be completely separated for each user of course.
My main "read" case is to get all the alerts for a product.
My current solution is to have product as the primary key (whenever I mention product than I am talking about a unique identifier for a product) and an alertId as sort key.
The alertId is a composite key of all the attributes: product:userId:targetPrice:operator.
So for example: greatBook12:1234:34:lesser.
here is some example code in node for persisting the alert:
const params = {
TableName: TABLE_NAME,
Item: {
userId,
alertId: `${product}:${userId}:${targetPrice}:${operator}`,
product,
targetPrice,
operator
},
ReturnValues: 'ALL_OLD'
};
docClient.put(params) // ...
My Question:
It feels kinda wrong to misuse the sort key like that. While it does cover all my requirements (no duplicates, read is easy and should be relatively fast) I was wondering if there isn't a better way of doing this. Maybe with indices or the like?
I kinda like the flat data structure (just items in a table) but maybe there is another way of creating unique alerts for different targetPrices/operators/products/users without creating duplicates?
So I guess my question is: Is there a better way of doing this while fulfilling the requirements I am working with?
Thank you very much in advance!

Very interesting question. From one side with product partition key you've querying simplicity but you also distribute your data unevenly. What if one product will have a big success and take 50% of all load ("hot partitions" problem detailed here https://cloudonaut.io/dynamodb-pitfall-limited-throughput-due-to-hot-partitions/) ? In such case you'll probably encounter reading or writing throtlling. DynamoDB advises to use some randomness (e.g. random values (1, 1000)) to avoid such uneven distribution. You can learn more about these strategies here: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-sharding.html#bp-partition-key-sharding-random
But it depends on how you're certain about hot partitions risk. If you're sure to not have them (products with much more alerts than others), maybe it's better to keep the schema simple for now ?

CQRS Read models in a NoSql (Mongo DB)

Hi its my fist time with DDD/CQRS. I've read multiple sources of knowledge and Im still confused a bit, maybe someone could help :)
Lets assume simple case that we have products and clients (possibly different bounded contexts).
A client can buy a product and he wants to see all products that he purchased.
In this case I realize I need a UserPurchasesView view model with:
purchaseId (which is a mongo primary key)
userId,
product: {id, name, image, shortDescription, [maybe some others]}
prize
timestamp
Now ... the problem is that My domain is producing an event like UserPurchasedProduct(userId, productId). I could enrich an event with a prize, product name or maybe something else but not all fields. Im getting to a point where enriching seems to be wrong.
In this point I realize I need something like ProductDetailsView:
productId (primary key)
prize
name
shortDescription
logo
This view is maintained by events like: ProductCreated, ProductRenamed, ProductImageChanged
And now we have 2 options ...
Look into the ProductDetailsView when UserPurchasedProduct event comes in, take all needed product details and save it in UserPurchasesView for faster reads. This solution looks not that bad but it introduces some extra coupling and it seems to me these views cannot be scaled well when needed. Also both views must be rebuilt together when replying all events from the event store (rebuilding is also more tricky in that case).
Keep only the productId in the UserPurchasesView and read multiple views when user queries his purchases. This is some extra processing that would have to be done somewhere. In the frontend, in the backend controller or in some read model high level API. UPDATE: I also realized that I would also need to keep at least the prize and maybe name of the product in the UserPurchasesView (in case it changes) but sometimes you need the value from the time of a purchase and sometimes you need the recent value. Scenario depends on a business but we could imagine both.
None of these solutions looks perfect to me. Am I wrong, am I missing something or is it just the way to do it? Thanks!

You understand well.
So you have to choose between coupling between the read models and coupling between UI and individual read models.
One of the main advantages of CQRS/ES is the posibility to create blazing fast read models (views if you like), without any joins, the perfect cache as I saw it called. I personally have chosen every time the first approach, with full data denormalisation. The views are very fast and models very clean and clear. This is the perfect solution if you want to optimize the read side of your application (and I think you should).
By listening to the right events you can keep these read models in sync with the rest of the application.

There is a 3rd option:
The projection responsible for the UserPurchasesView view not only listens to UserPurchasedProduct events, but also to ProductCreated, ProductRenamed, ProductImageChanged - any product related events that affect the UserPurchasesView. Now, as well as the UserPurchasesView collection for the read model that it is responsible for, it also needs a private collection to maintain the bits of products it is interested in: ({id, name, image, shortDescription, [maybe some others]}), so that when a new purchase event comes in, you have somewhere to get the initial state of those product fields from. Since your UserPurchasesView needs to listen to some of those product events anyway in order to keep up to date when a product changes, this isn't really much extra work, and avoids any dependency on another projection (ProductDetailsView). The cross-projection dependency also has a potential problem due to eventual consistency - what if the product isn't even in the product details view yet when the UserPurchasedProduct event comes through?
To avoid any concurrency issues, it's simplest to have each projection managed only by a single process and a single thread. That way, as long as the projection can receive events in-order across streams (so that it is guaranteed to see the product creation before the product purchase), you won't have issues with seeing a purchase before the product exists. If you introduce sharding or any other multi-threading to your projection, it gets more complicated.

What is the correct data model for storing user relationships in Cassandra (i.e. Bob follows John)

I have a system where actions of users need to be sent to other users who subscribe to those updates. There aren't a lot of users/subscribers at the moment, but it could grow rapidly so I want to make sure I get it right. Is it just this simple?
create table subscriptions (person_uuid uuid,
subscribes_person_uuid uuid,
primary key (person_uuid, subscribes_person_uuid)
)
I need to be able to look up things in both directions, i.e. answer the questions:
Who are Bob's subscribers.
Who does Bob subscribe to
Any ideas, feedback, suggestions would be useful.

Those two queries represent the start of your model:
you want the user to be the PK or part of the PK.
depending on the cardinality of subscriptions/subscribers you could go with:
for low numbers: using a single table and two sets
for high numbers: using 2 tables similar to the one you describe

#Jacob
Your use case looks very similar to the Twitter example, I did modelize it here
If you want to track both sides of relationship, I'll need to have a dedicated table to index them.
Last but not least, depending on the fact that the users are mutable OR not, you can decide to denormalize (e.g. duplicate User content) or just store user ids and then fetch users content in a separated table.
I've implemented simple join feature in Achilles. Have a look if you want to go this way

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string