How to specify feed when using 'updateActivities' with Stream? - getstream-io

The updateActivities method in the Stream API is perplexing, as the docs seem to indicate that a feed is not specified during this operation. How is this supposed to work?
The other activity methods (addActivity, removeActivity) are performed on a feed object, which makes sense. But the docs show updateActivities as a method on the client object, with no way to specify the feed containing the activity.
From the docs:
var now = new Date();
activity = {
"actor": "1",
"verb":"like",
"object": "3",
"time": now.toISOString(),
"foreign_id": "like:3",
"popularity": 100
};
// first time the activity is added
user1.addActivity(activity);
// update the popularity value for the activity
activity.popularity = 10;
// send the update to the APIs
client.updateActivities([activity]);
My expectation (and the only thing that makes sense, as far as I can tell), would be that the updateActivities method would be on the feed object, since a foreign_id is not unique across all feeds.
(Previous assumption based on lots of experience using identical foreign IDs across multiple feeds.)

When an activity is added to a feed, Stream generates a unique ID for it and uses such ID to propagate the activity to the direct feed and if any, to all follower feeds. In fact only references to activities are stored inside feeds.
Stream also guarantees that IDs are consistent for same time and foreign_id values. This means that if you add an activity with same time and foreign_id, it will always end up with the same ID.
This allows you to control activity uniqueness and to update all occurrences of an activity without keeping track of all feeds that can have a copy (the to target and follow relationships would make this a very complex task!).

Related

Fixed Sized Bucket - Nested JSON Array

I have stream of events coming from a particular user. I am using Cosmos DB to store my User profile. Take following example JSON object. Here I just want to store limited set of events, say only 2. As soon as the 3rd event comes in I want to remove older one and add that 3rd event, not exceeding my bucket size. Easy way is for each update I pull the record for that User, and modify and update. Was wondering if there is any other efficient way to achieve the same.
{
"id":"4717cd3c-78d9-4a0e-bf5d-4645c97bd55c",
"email":"abc#acme.org",
"events":[
{
"event":"USER_INSTALL",
"time":1641232180,
"data":{
"app":"com.abc"
}
},
{
"time":1641232181,
"event":"USER_POST_INSTALL",
"data":{
"app":"com.xyz"
}
}
]
}
There are no options to limit an array size, within a document. You would need to modify this on your own, as you're currently doing. Even if you stored each array item as a separate document, you would still need to periodically purge older documents on your own. At least, with independent documents per event, you could consider purging older events via ttl, but you still wouldn't be able to specify an exact number of documents to keep.

CouchDB filter function and continuous feed

I have a filter function filtering based on document property, e.g. "version: A" and it works fine, until there a document update at some point in time when this property "version: A" removed (or updated to "version: B").
At this point i would like to be notified that the document been updated, similar to one when the document get deleted, but couldn't find an effective way (without listening and processing all documents changes).
Hope i'm just missing something and it's not a design limitation.
While my other answer is a valid approach, I had this same situation yesterday and decided to look at making this work using Mango selectors. I did the following:
Establish a changes feed filtered by the query selector (see the "_selector" filter for /db/_changes)
Perform the query (db/_find) and record the results
Establish a second changes feed that filters for just in the documents returned in (2) (see the "_doc_ids" filter for /db/_changes)
The feed at (1) lets you know when new documents match your query along with edits to documents that matched your query both before and after the change.
The feed at (2) lets you know when a change is made to a document that previously matched your query, irrespective of if it matches your query after the change has been made.
The combination of these feeds covers all cases, though with some false positives. On a change in either feed, tear down the changes feed at (3) and redo steps (2) and (3).
Now, some notes on this approach:
This is really only suitable in cases where the number of documents returned by the query is small because if the filtering by _id in the second feed.
Care must be taken to ensure that the second feed is established correctly if there are lots of changes coming in from the first changes feed.
There are cases where a change will appear in both feeds. It would be good to avoid reacting twice.
If changes are expected to happen frequently, then employ debouncing or rate limiting if your client does not need to process each and every change notification.
This approach worked well for me and the cases I had to deal with.
References:
http://docs.couchdb.org/en/stable/api/database/find.html
http://docs.couchdb.org/en/stable/api/database/changes.html
The behaviour that you described is correct.
CouchDB will populate the changes feed with the docs that accomplish with the filter function. If you remove/modify the information that is used by the filter function the filtered changes feed will ignore those updates.
The closest you will come to this is to use a view and filter the changes feed based on that view - see [1] for details.
You can create a simple view that includes the "version" as part of the key using a map function such as:
function (doc) {
emit(doc.version, 1);
}
A changes feed filtered by this view will notify you of the insert or deletion of documents that have a "version" field as well as changes to the "version" field of existing documents. You can not, however, determine the previous value of the "version" field from the changes feed.
Depending on your requirements, you can make the view more targeted. For example, if you only cared about transition form "A" to "B" then you could include only documents that have "A" or "B" as their "Version":
function (doc) {
if( doc.version === "A" || doc.version === "B") {
emit(doc.version, 1);
}
}
But be aware that this will not trigger a change notification on transition from, say, "A" to "C" (or any other value for "version", including when the document is deleted) because change notifications are only send when the map function emit()'s at least one value for a document. It doesn't not notify you when the map function used to emit at least one value for a give document, but no longer does!
You can also filter the changes feed using Mango selectors, so if Mango queries work for you then perhaps this is simpler than using a view, but I'm not sure that you can be notified of deletions via Mango selectors...
EDIT:
May claim about the simple map function above is not quite right as it will notify you of all document insertions and deletions, not just ones with a "version" field. You can do this to avoid some of those false positive notifications:
function (doc) {
if ( doc.hasOwnProperty( 'version' ) || doc.hasOwnProperty( '_deleted' ) ) {
emit(doc.version, 1);
}
}
That will give notifications for new documents with a "version" field, or an update that adds a "version" field to an existing document, but it will still notify of all deletions.
[1] http://docs.couchdb.org/en/stable/api/database/changes.html#changes-filter-view

Get single activity by id

How to get an activity by its id (unique uuid) or by foreign_id + time?
I could not find it in documentation. All information there represents how to get feed in pages. Not a single activity.
If you save the id you get from adding an activity, then you can opt to fetch activities based on the id, using "id_gte" or "id_lte" and only fetch with offset 0 and limit 1. Such as:
$feed->getActivities(0, 1, ['id_gte' => $id]);
This code is based on php, but their sdk should have equal functions for other languages if you require.
The general goal should be to use Stream as a secondary data store. Your proprietary data from your customers should always be accessible in your own primary data store: most likely a RDBMS like PostgreSQL. When new follow relationships get created, or new activities get added, you should store them locally, and replicate the data to GetStream. Then access feeds when a users wants to see a timeline or notification feed, and complement your data from data found in your own DB (for example: comments, likes, author information, ...)
For this reason, there is no getActivity(uuid) method available.
Using the python client you can get an activity by its ID
import stream
client = stream.connect('YOUR_API_KEY', 'API_KEY_SECRET')
client.get_activities(ids=[activity_id])
Or by its foreign_id + time
client.get_activities(foreign_id_times=[
(foreign_id, activity_time),
])
Taken from https://github.com/GetStream/stream-python/blob/main/README.md

Running an Azure function app each time a CosmosDB document is created and updating documents in a second collection

I have a scenario, where we have items save in one documentDb collection e.g. under /items/{documentId}. The document looks similar to:
{
id: [guid],
rating: 5,
numReviews: 1
}
I have a second document collection under /user-reviews/{userIdAsPartitionKey}/{documentId}
The document will look like so:
{
id: [guid],
itemId: [guidFromItemsCollection],
userId: [userId],
rating: 4
}
Upon uploading of this document, I want a trigger to be fired which takes as input this new user rating document, is able to retrieve the relevant document from the items collection, transform the items document based on the new data.
The crux of my problem is: how can I trigger off a document upsert, and how can I retrieve and modify a document from another collection, all within a Funciton App?
I've investigated the following links, which tease at the idea of Triggers being possible on the CosmosDB, but the table suggests we can't hook up a trigger to document DB upload.
https://learn.microsoft.com/en-us/azure/azure-functions/functions-triggers-bindings
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-documentdb
If it's not possible to set up directly, my assumption is I should have a middle tier service handling the upsert (currently using DocumentClient from client side), which can kick off this processing itself, but I like the simplicity of the serverless function apps if possible.
Operations are scoped to a collection. You cannot trigger an operation in Collection B from an event in Collection A.
You'd either need to implement this in your app tier (as you suggested) or... store both types of documents in the same collection (a common scenario). You might need to add some type of doctype property to help filter your queries, but since documents are schema-free, you can store heterogeneous documents in the same collection.
Also: You mentioned an Azure Function. Within a function, there's nothing stopping you from making multiple database calls (e.g. when something happens in collection a and causes your function to be called, your function can perform an operation in collection b). Just note that this won't be transactional.
I know this is a pretty old question.
The Change Feed was built for this exact scenario.
In today's Azure Portal, there's even a menu option in the CosmosDB blade that allows you to create and trigger Function based on changes in one collection which allows you to detect and react to changes - i.e. to create a document in another collection.

DDD/CQRS: Combining read models for UI requirements

Let's use the classic example of blog context. In our domain we have the following scenarios: Users can write Posts. Posts must be cataloged at least in one Category. Posts can be described using Tags. Users can comment on Posts.
The four entities (Post, Category, Tag, Comment) are implemented as different aggregates because of I have not detected any rule for that an entity data should interfere in another. So, for each aggregate I will have one repository that represent it. Too, each aggregate reference others by his id.
Following CQRS, from this scenario I have deducted typical use cases that result on commands such as WriteNewPostCommand, PublishPostCommand, DeletePostCommand etc... along with their respective queries to get data from repositories. FindPostByIdQuery, FindTagByTagNameQuery, FindPostsByAuthorIdQuery etc...
Depending on which site of the app we are (backend or fronted) we will have queries more or less complex. So, if we are on the front page maybe we need build some widgets to get last comments, latest post of a category, etc... Queries that involve a simple Query object (few search criterias) and a QueryHandler very simple (a single repository as dependency on the handler class)
But in other places this queries can be more complex. In an admin panel we require to show in a table a relation that satisfy a complex search criteria. Might be interesting search posts by: author name (no id), categories names, tags name, publish date... Criterias that belongs to different aggregates and different repositories.
In addition, in our table of post we dont want to show the post along with author ID, or categories ID. We need to show all information (name user, avatar, category name, category icon etc).
My questions are:
At infrastructure layer, when we design repositories, the search methods (findAll, findById, findByCriterias...), should have return the corresponding entity referencing to all associations id's? I mean, If a have a method findPostById(uuid) or findPostByCustomFilter(filter), should return a post instance with a reference to all categories id it has, all tags id, and author id that it has? Or should my repo have some kind of method that populates a given post instance with the associations I want?
If I want to search posts created from 12/12/2014, written by John, and categorised on "News" and "Videos" categories and tags "sci-fi" and "adventure", and get the full details of each aggregate, how should create my Query and QueryHandler?
a) Create a Query with all my parameters (authorName, categoriesNames, TagsNames, if a want retrive User, Category, Tag association full detailed) and then his QueryHandler ensamble the different read models in a only one. Or...
b) Create different Queries (FindCategoryByName, FindTagByName, FindUserByName) and then my web controller calls them for later
call to FindPostQuery but now passing him the authorid, categoryid, tagid returned from the other queries?
The b) solution appear more clean but it seems me more expensive.
On the query side, there are no entities. You are free to populate your read models in any way suits your requirements best. Whatever data you need to display on (a part of) the screen, you put it in the read model. It's not the command side repositories that return these read models but specialized query side data access objects.
You mentioned "complex search criteria" -- I recommend you model it with a corresponding SearchCriteria object. This object would be technnology agnostic, but it would be passed to your Query side data access object that would know how to combine the criteria to build a lower level query for the specific data store it's targeted at.
With simple applications like this, it's easier to not get distracted by aggregates. Do event sourcing, subscribe to the events by one set of tables that is easy to query the way you want.
Another words, it sounds like you're main goal is to be able to query easily for the scenarios you describe. Start with that end goal. Now write your event handler to adjust your tables accordingly.
Start with events and the UI. Then everything else will fit easily. Google "Event Modeling" as it will help you formulate ideas sound what and how you want to build these style of applications.
I can see three problems in your approach and they need to be solved separately:
In CQRS the Queries are completely separate from the Commands. So, don't try to solve your queries with your Commands pipelines repositories. The point of CQRS is precisely to allow you to solve the commands and queries in very different ways, as they have very different requirements.
You mention DDD in the question title, but you don't mention your Bounded Contexts in the question itself. If you follow DDD, you'll most likely have more than one BC. For example, in your question, it could be that CategoryName and AuthorName belong to two different BCs, which are also different from the BC where the blog posts are. If that is the case and each BC properly owns its own data, the data that you want to search by and show in the UI will be stored potentially in different databases, therefore implementing a query in the DB with a join might not even be possible.
Searching and Reading data are two different concerns and can/should be solved differently. When you search, you get some search criteria (including sorting and paging) and the result is basically a list of IDs (authorIds, postIds, commentIds). When you Read data, you get one or more Ids and the result is one or more DTOs with all the required data properties. It is normal that you need to read data from multiple BCs to populate a single page, that's called UI composition.
So if we agree on these 3 points and especially focussing on point 3, I would suggest the following:
Figure out all the searches that you want to do and see if you can decompose them to simple searches by BC. For example, search blog posts by author name is a problem, because the author information could be in a different BC than the blog posts. So, why not implement a SearchAuthorByName in the Authors BC and then a SearchPostsByAuthorId in the Posts BC. You can do this from the Client itself or from the API. Doing it in the client gives the client a lot of flexibility because there are many ways a client can get an authorId (from a MyFavourites list, from a paginated list or from a search by name) and then get the posts by authorId is a separate operation. You can do the same by tags, categories and other things. The Post will have Ids, but not the extra details about those IDs.
Potentially, you might want more complicated searches. As long as the search criteria (including sorting fields) contain fields from a single BC, you can easily create a read model and execute the search there. Note that this is only for the search criteria. If the search result needs data from multiple BCs you can solve it with UI composition. But if the search criteria contain fields from multiple BCs, then you'll need some sort of Search engine capable of indexing data coming from multiple sources. This is especially evident if you want to do full-text search, search by categories, tags, etc. with large quantities of data. You will need to use some specialized service like Elastic Search and it won't belong to any of your existing BCs, it'll be like a supporting service.
From CQRS you will have a separeted Stack for Queries and Commands. Your query stack should represent a diferente module, namespace, dll or package at your project.
a) You will create one QueryModel and this query model will return whatever you need. If you are familiar with Entity Framework or NHibernate, you will create a Façade to hold this queries togheter, DbContext or Session.
b) You can create this separeted queries, but saying again, if you are familiar with any ORM your should return the set that represents the model, return every set as IQueryable and use LET (Linq Expression Trees) to make your Query stack more dynamic.
Using Entity Framework and C# for exemple:
public class QueryModelDatabase : DbContext, IQueryModelDatabase
{
public QueryModelDatabase() : base("dbname")
{
_products = base.Set<Product>();
_orders = base.Set<Order>();
}
private readonly DbSet<Order> _orders = null;
private readonly DbSet<Product> _products = null;
public IQueryable<Order> Orders
{
get { return this._orders.Include("Items").Include("Items.Product"); }
}
public IQueryable<Product> Products
{
get { return _products; }
}
}
Then you should do queries the way you need and return anything:
using (var db = new QueryModelDatabase())
{
var queryable = from o in db.Orders.Include(p => p.Items).Include("Details.Product")
where o.OrderId == orderId
select new OrderFoundViewModel
{
Id = o.OrderId,
State = o.State.ToString(),
Total = o.Total,
OrderDate = o.Date,
Details = o.Items
};
try
{
var o = queryable.First();
return o;
}
catch (InvalidOperationException)
{
return new OrderFoundViewModel();
}
}

Resources