Two streams for inter-related models? - getstream-io

If we have users and posts - and I can follow a user (and see all their posts), or follow a particular post (and see all it's edits/updates), would each post be pushed to two seperate streams, one for the user and another for the post?
My concern is that if a user follows an idea, and also the user feed, their aggregated activity-feed could show multiple instances of the same idea, one from each feed.

Every unique activity will only appear at most once in a feed. To make an activity have the exact same internal ID, you might try using the to field. This add an activity into different feed groups with the same activity UUID.
If this is not possible, at least make an activity unique, by both entering the same time and foreign_id values. This will make an activity unique as well.
Cheers!

Related

Command accros multiple aggregates with CQRS and ES

I'm having an odd case while thinking about a solution for my problem.
A quick recap: I'm using an event store with CQRS, and i have 2 aggregates called 'Group' and 'User'.
Basically a User defines some characteristics like his region, age, and a couple of interests.
He then can choose to 'match' with a Group that is in the same region, around the same age and same interests.
Now here's the case: the 'matchmaking' part should happen completely on the backend, it can be a long running process, but for the client it's just 1 call to the endpoint and the end result should be him matching with a group.
So for this case, I have to query the groups which have the same region, the same age slice, the interests don't really matter in my query. I know have a list of groups, and the match maker is going to give each group a rating based on the common interests between the group and the user. The group with the best rating will be joined.
So again, using CQRS and ES, and my problem is that this case seems a mix between queries and a command, and mixing queries into a match command seems to go against the purpose of CQRS.
Querying multiple groups and filtering them against my write side, the event store, also is a bad idea as the aggregates have to be rebuilt and loaded in memory before being able to filter them out.
So I:m kind of stuck here, something is telling me that a long running process / saga could be an answer to my problem, but I don't see how I would still not break the mix of query and commands in my saga, as a saga is basically a chain of commands/events.
How do I tackle this specific case ? No real code is needed, a conceptual solution to get me going is perfect.
Hi this is actually a case where CQRS can shine.
Creating a dedicated matching model seems to be ideal for this case to allow answering what might be a rather non-trivial query in other forms.
So,
create a dedicated (possibly ephemeral, possibly checkpointed/persisted) query model as derived store.
Upon request run a query to get the top matches.
based on the results of the query send a command to update the event store with the new links.
The query model will not need to manage commands and could be updated on a push basis from the event store. This will keep it rather simple to build and keep up to date and further can be optimized to only have the data needed for for this particular query.
An in-memory graph might do well.
-Chris
p.s.
On the command side: the commands here would each only update a single aggregate instance.
Further using the write ahead pattern would allow for not needing any sort of process manager or "saga."
e.g.
For each new membership 1 command to add the new membership to the user stream, then 1 command to the group to add the new member information. Then a simple audit process can scan for incomplete membership assignments both on start up/recovery and as a periodic data quality check.
-Chris

Back-filling a feed?

Is there a way to insert activities into a feed so they appear as if they were inserted at a specific time in the past? I had assumed that when adding items to a feed it would use the 'time' value to sort the results, even when propagated to other feeds following the initial feed, but it seems that's not the case and they just get sorted by the order they were added to the feed.
I'm working on a timeline view for our users, and I have a couple of reasons for wanting to insert activities at previous points in time:
1) We have a large number of entities in our database but a relatively small number of them will be followed (especially at first), so to be more efficient I had planned to only add activities for an entity once it had at least one follower. Once somebody follows it I would like to go back 14 days and insert activities for that entity as if they were created at the time they occurred, so the new follower would see them in their feed at the appropriate place. Currently they will just see a huge group of activities from the past at the top of their feed which is not useful.
2) Similarly, we already have certain following relationships within our database and at launch I would like to go back a certain amount of time and insert activities for all entities that already have followers so that the feed is immediately useful.
Is there any way to do this, or am I out of luck?
My feeds are a combination of flat and aggregated feeds - the main timeline for a user is aggregated, but most entity feeds are flat. All of my aggregation groups would be based on the time of the activity so ideally there would be a way to sort the final aggregation groups by time as well.
Feeds on Stream are sorted differently depending on their type:
Flat feeds are sorted based by activity time descending
Aggregated feeds and Notification feeds sort activity groups based on last-updated (activities inside groups are sorted by time descending)
This means that you can back-fill flat feeds but not aggregated feeds.
One possible way to get something similar to what you describe is to create follow relationship with copy_limit set to a low number so that only the most recent activities are propagated to followers.

Dialogflow entity list on server

I think this question has been asked but there are no clear answers.
The question is simple.
Can you have an entity list on the server.
For example I have a list of Product names on my database which can be really big. I want the intent to recognise these entities based on a list on the server.
The other thing I would like to do is filter an entity list.
e.g. I have a list of stores. I want it to be filtered by location, say by distance and lat long showing only stores near you when I ask a question.
Things which are so easy to do in apps seem so difficult in Dialogflow.
Please do not provide solutions which can be done on the server through webhooks. I already know about that and have used it.
I just want a better way to use entities so that the NLP can become more powerful.
The best way to do will be using Entities with webhook.
You may enable slot filling for the parameters.
In the webhook, have a set of stores based on locations and hashmap with the location as key and set of stores as value.
when the location is provided, fetch the corresponding set of stores.
when the store is provided, see if that store is present in the set.
reprompt if the information is not correct by resetting the context if required.
UPDATE
You may ask the user for the product names. Match the entity name with the names in DB. If present, use it if not, provide the user with some option from the DB that may match with what the user is saying and ask them to choose one. You need to think from a conversation point of view how two people communicate with each other.

CQRS/Event Sourcing - Does one expect to receive an Aggregate Id from the user/request?

I am currently just trying to learn some new programming patterns and I decided to give event sourcing a shot.
I have decided to model a warehouse as my aggregate root in the domain of shipping/inventory where the number of warehouses is generally pretty constant (i.e. a company wont be adding warehouses too often).
I have run into the question of how to set my aggregateId, which should correspond to a warehouse, on my server. Most examples I have seen, including this one, show the aggregate ID being generated server side when a new aggregate is being created (in my case a warehouse), and then passed in the command request when referring to that aggregate for subsequent commands.
Would you say this is the correct approach? Can I expect the user to know and pass aggregate Ids when issuing commands? I realize this is probably domain dependent and could also be a UI/UX choice as well, just wondering what other's have done. It would make more sense to me if the number of my event sourced aggregates were more frequent, such as with meal tabs or shopping carts.
Thanks!
Heuristic: aggregate id, in many cases, is analogous to the primary key used to distinguish entities in a database table. Many of the lessons of natural vs surrogate keys apply.
Can I expect the user to know and pass aggregate Ids when issuing commands?
You probably can't depend on the human to know the aggregate ids. But the client that the human operator is using can very well know them.
For instance, if an operator is going to be working in a single warehouse during a session, then we might look up the appropriate identifier, cache it, and use it when constructing messages on behalf of the user.
Analog: when you fill in a web form and submit it, the browser does the work of looking at the form action and using that information to construct the correct URI, and similarly the correct HTTP Request.
The client will normally know what the ID is, because it just got it during a previous query.
Creation patterns are weird. It can, in some circumstances, make sense for the client to choose the identifier to be used when creating a new aggregate. In others, it makes sense for the client to provide an identifier for the command message, and the server decides for itself what the aggregate identifier should be.
It's messaging, so you want to be careful about coupling the client directly to your internal implementation details -- especially if that client is under a different development schedule. If you get the message contract right, then the server and client can evolve in any way consistent with the contract at any time.
You may want to review Greg Young's 10 year retrospective, which includes a discussion of warehouse systems. TL;DR - in many cases the messages coming from the human operators are events, not commands.
Would you say this is the correct approach?
You're asking if one of Greg Young's Event Sourcing samples represents the correct approach... Given that the combination of CQRS and Event Sourcing was essentially (re)invented by Greg, I'd say there's a pretty good chance of that.
In general, letting the code that implements the Command-side generate a GUID for every Command, Event, or other persistent object that it needs to write is by far the simplest implementation, since GUIDs are guaranteed to be unique. In a distributed system, uniqueness without coordination is a big thing.
Can I expect the user to know and pass aggregate Ids when issuing commands?
No, and you particularly can't expect a user to know the GUID of their assets. What you may be able to do is to present the user with a list of his or her assets. Each item in the list will have the GUID associated, but it may not be necessary to surface that ID in the user interface. It's just data that the underlying UI object carries around internally.
In some cases, users do need to know the ID of some of their assets (e.g. if it involves phone support). In that case, you can add a lookup API to address that concern.

Aggregate feed, removing duplicates in getstream

I have followed the adivce here stackoverflow aggregate answer
I am grouping posts together(shares for same post together, likes for same posts together, regular posts as single activities). What I'm noticing, however, is that I end up with duplicates for a user. If a user shares a post, and also likes the post, it shows up twice on their getstream feed.Right now, I have to do filtering on my own backend service with a certain order(If you share a post, remove the activity if you also liked it).If you like a post, then remove the regular post.Is there a better way to solve this problem of duplicates?
One idea that comes to mind: when you post an activity of a share, make sure you send a foreign_id and time (sending both will avoid duplicates in our system), then if you also 'like' the activity you could store a like counter in the activity metadata, and send an update with the foreign_id and incrementing the like count.
Keep in mind that updates don't push to aggregated feeds or notification feeds, though, so you'd still want to push that 'like' activity to those feeds, too.

Resources