Managing messages in chat room with MongoDB and Graphql - node.js

I am wondering how to manage messages in my chat room. My assumptions are:
There is a collection rooms with fields like id, messages, participants
There can be many rooms with many participants
Now, I have doubts:
should I have separate collection with messages (id, author, text, where author is a reference to users collection)?
Or maybe should I keep simple objects in messages instead of documents with refs?
I can imagine that collection with messages will be huuuuge (if is not cleared). Will Mongo handle it? Or maybe there is a better way for doing that?
Regards

It depends on the scale of what you're building.
I would say that, what meatsacks like you and me consider to be huge amounts is often peanuts for database systems (be it relational or a nosql datastore).
It's hard to say without knowing anything about the project, but i suspect you'll likely be better off, if you design your data model based on correctness/usefulness, and worry about performance as a next step.
Based on the entities you describe (rooms, messages, participants, users, ...) i'm picturing an application such as Discord. In such a case i would think of rooms and users as first order entities and both participants and messages as (big) ordered lists of data belonging to a room (while each entry in both obviously also has a reference to its personal user/author).

Related

ArangoDB: having a few large collections vs. a lot small collections

I have a question regarding performance/best practice:
Scenario: I have a user-collection and a chatbot-collection. They can be a lot of users (lets say 100 -1000 users) in the user-collection. Each user can have multiple chatbots (around 10 per user).
Option A: I create an edge collection to define the connection between user -> chatbot. At the end I would have 1 user-collection, 1 chatbot-collection (containing all chatbots from all users) and 1 edge-collection (containing the definitions from a user to its chatbots)
Option B: I create a separate chatbot-collection for each user, to have all chatbots of a specific user in one place. Chatbot-collection name would be e.g. user_xyz(user._key)_chatbots. So if I need all chatbots of a user with the _key ‚abc‘, I would check the collection user_abc_chatbots. In this case I don’t need an edge collection for the connection user -> chatbot. At the end I would have 1 user-collection and a lot of user_xyz_chatbots-collections (depending on how many users I have - can be 100-1000 as I wrote before).
Now my question: What is the better option? Also regarding performance - Image I have to get all (or a specific) chatbot of a user each time I receive a request.
Would be awesome if you can give me feedback on your experience/thoughts :)
Looking at the numbers you posted, i.e. 100 - 1000 users and about 10 chatbots per user, this would mean just 1000 to 10000 chatbots in total.
For this dimension of data, I would say it makes more sense to store all chatbots in a single collection, and use an (indexed) attribute to store the user id for each chatbot. This is a 1:n relationship (1 user mapped to n chatbots)
That way you can easily and still quickly find all chatbots mapped to a particular user, but this setup will also allow you to make analyses for all users or all chatbots easily.
This would be much more difficult to achieve if the chatbots of each user would be located in a different collection.
In addition, if the same chatbots can be mapped to multiple users, it may actually make sense to use three collections:
one collection for users
one collection for chatbots
and one mapping collection between users and chatbots
This would be an n:m relationship, in which each user can still be mapped to any number of chatbots, but if multiple users are mapped to the same chatbot, the data of each chatbot does not need to be stored redundantly.
I would only recommend to use separate chatbot collections per user if each chatbot has an individual data structure that is separate from all others, and that needs special indexing or querying. In this case it may make sense to separate different chatbots.
However, having too many collections (here we would think of at most 1,000) also isn't great, because each collection has a small overhead even when empty. This amortized much better if there are fewer collections that are used more frequently than when having many collections which are used seldomly.

MongoDB standards for User/Room relations

So I have this MEAN-project I hobby on in my spare time.
Right now I'm setting up users and rooms, and am a bit hesitant about progressing further, as I am unsure about the proper protocol of db's in general.
As I recall, you're not supposed to have a Many-To-Many relationship; rather, you're supposed to have a relation table.
Right now, my User schema has an array of rooms he is in, and my Room schema has an array of users tied to it (the third and last schema being Message).
Is it better to have a userroomrelation doc that holds a PK, an id of one room, and then a list of all users in this room?
Thanks,
Rasmus
MongoDB isn't a relational database like *SQL databases (hence why MongoDB is called NoSQL), so using a relation table is fairly inefficient in Mongo. Holding an array of user _id's in the room collection is about as ideal as you could get, if you don't want repeat data.
Here are some more indepth answers on many-to-many in MongoDB.
How can a User be in more than one room? Isn't that just a property on the User? And if you index that why would you also need to store it on the Room?
There is no one right way, it really depends on how many of each object you have and if it's a small number (as rooms and users implies) you may be better with a simpler and more robust (cannot store impossible values) approach like having a single property on a user RoomId. That's never going to be inconsistent and if you need to find the set of users in a given room it's a cheap query.
In MongoDB you CAN denormalize the data and store an array on each object containing part or all of the other object, but you can also create an effective join collection if you want to.
For example you could have a collection {UserId, RoomId, DateTimeEntered, DateTimeLeft} with appropriate indexes which allows you to quickly find all the users in a given room at a given time. Once you have the set of Ids you could go load them if you need them for display OR you could add the fields you need for display to this table {UserId, UserName, ...} BUT then you have the problem of maintaining that data if it ever changes OR keeping it intact if you need to know that when they entered the room that's what it was called.
There are also a TON of other questions on StackOverflow relating to how you should store related data, I suggest you go read those also.

CQRS Read models in a NoSql (Mongo DB)

Hi its my fist time with DDD/CQRS. I've read multiple sources of knowledge and Im still confused a bit, maybe someone could help :)
Lets assume simple case that we have products and clients (possibly different bounded contexts).
A client can buy a product and he wants to see all products that he purchased.
In this case I realize I need a UserPurchasesView view model with:
purchaseId (which is a mongo primary key)
userId,
product: {id, name, image, shortDescription, [maybe some others]}
prize
timestamp
Now ... the problem is that My domain is producing an event like UserPurchasedProduct(userId, productId). I could enrich an event with a prize, product name or maybe something else but not all fields. Im getting to a point where enriching seems to be wrong.
In this point I realize I need something like ProductDetailsView:
productId (primary key)
prize
name
shortDescription
logo
This view is maintained by events like: ProductCreated, ProductRenamed, ProductImageChanged
And now we have 2 options ...
Look into the ProductDetailsView when UserPurchasedProduct event comes in, take all needed product details and save it in UserPurchasesView for faster reads. This solution looks not that bad but it introduces some extra coupling and it seems to me these views cannot be scaled well when needed. Also both views must be rebuilt together when replying all events from the event store (rebuilding is also more tricky in that case).
Keep only the productId in the UserPurchasesView and read multiple views when user queries his purchases. This is some extra processing that would have to be done somewhere. In the frontend, in the backend controller or in some read model high level API. UPDATE: I also realized that I would also need to keep at least the prize and maybe name of the product in the UserPurchasesView (in case it changes) but sometimes you need the value from the time of a purchase and sometimes you need the recent value. Scenario depends on a business but we could imagine both.
None of these solutions looks perfect to me. Am I wrong, am I missing something or is it just the way to do it? Thanks!
You understand well.
So you have to choose between coupling between the read models and coupling between UI and individual read models.
One of the main advantages of CQRS/ES is the posibility to create blazing fast read models (views if you like), without any joins, the perfect cache as I saw it called. I personally have chosen every time the first approach, with full data denormalisation. The views are very fast and models very clean and clear. This is the perfect solution if you want to optimize the read side of your application (and I think you should).
By listening to the right events you can keep these read models in sync with the rest of the application.
There is a 3rd option:
The projection responsible for the UserPurchasesView view not only listens to UserPurchasedProduct events, but also to ProductCreated, ProductRenamed, ProductImageChanged - any product related events that affect the UserPurchasesView. Now, as well as the UserPurchasesView collection for the read model that it is responsible for, it also needs a private collection to maintain the bits of products it is interested in: ({id, name, image, shortDescription, [maybe some others]}), so that when a new purchase event comes in, you have somewhere to get the initial state of those product fields from. Since your UserPurchasesView needs to listen to some of those product events anyway in order to keep up to date when a product changes, this isn't really much extra work, and avoids any dependency on another projection (ProductDetailsView). The cross-projection dependency also has a potential problem due to eventual consistency - what if the product isn't even in the product details view yet when the UserPurchasedProduct event comes through?
To avoid any concurrency issues, it's simplest to have each projection managed only by a single process and a single thread. That way, as long as the projection can receive events in-order across streams (so that it is guaranteed to see the product creation before the product purchase), you won't have issues with seeing a purchase before the product exists. If you introduce sharding or any other multi-threading to your projection, it gets more complicated.

DDD/CQRS Querying Events

I was looking at post's on querying in application designed with approach Event Sourcing/DDD/CQRS.
As I understand events are changes to the state of a domain object. The changes to state will be maintained as history/events in DB(any of sql/no sql).
If user wants to query to get current state for a particular aggregate root, it will involve fetching history of events.
When user will query especially business specific queries he/she will be interested in current state not the history of events.
How querying or 'Q' part in CQRS works with event sourcing?
Consider I have a domain object "Account" as aggregate root. The account AR will go through lots of changes i.e. credits debits. event store will have credit and debit events.
Consider user is required to get current balance of an account, how stream of history of events will suite here? How will user fetch current balance for given account?
I am unable to understand, How for business specific querying history of events will be useful?
-Prakhyat M M
I would recommend you to read more articles from Greg Young (He is like the father of CQRS and Event Sourcing), like this: CQRS, Task Based UIs, Event Sourcing... agh.
Sorry for my bad English, I am from Paraguay. But I really like DDD - CQRS - ES and I would like to try to make a point.
The use of "Projections" (also known as Materialized Views) and the concept of "Eventual Consistency" are the fundamentals that every practitioner of CQRS should understand very well. The Event Store is for query. Is in the Command side of CQRS, not the in the Query side. You may use a bus to send the events stored in the Event Store to the query side in order to process and generate a read model, or view models, from which you can query. In any case a eventstore per se is a query model.
Looks like you are a Java guy, but, still, you may want to check the CQRS Journey from Microsoft.
Hope this helps a little bit and motivates you to do more research on DDD / CQRS / ES, the New Trio of Line of Business Applications.
You'll use a projection of the event stream into the read model, that contains exactly those information that the Query-side (Q) needs. For example, you could have an "account balance" projection that follows all events that change the account balance, but possibly ignores other events in the account's stream (such as owner changes). The projection then saves that info in a way that it can be queried very quickly, e.g., in memory or in a small read-model database table (accountId, balance) with the accountId as the key (database can be a key-value store, for example).
I suggest further reading on the CQRS concept such as this one or this one.
Interesting enough, recently more people discover using event store as the read model, leaving projections and "proper" read models until absolutely necessary.
We all know that dealing with projections increases the complexity. At minimum you have to create new models, establish the DAL for the read model and create projections to translate event to the read model changes, and bind those projections to the stream of events from your store. It requires more code, more moving parts and some of them are not easy to test. Schema changes at the read side also require migrations.
It appears that for many scenarios reading all events (properly partitioned) might be enough to have your "read model". It takes not much time until the system really grows large so you need to read tens of thousands of events to create one UI screen. But before you reach this point, you can just read events. May be use the file system to store events although tools like EventStore are free and quite easy to use. May be add some indexing.
This approach let you stabilise the domain significantly, you get more knowledge about how the system works, tune the events and be really prepared to bring the "proper" read model into the system, but you might not have to.
Adam Dymitruk has wrote a blog post about it, you might find it worth reading even if you don't want to take this approach. Greg Young also gave a talk EventStore as read model back in 2012.

What is the correct data model for storing user relationships in Cassandra (i.e. Bob follows John)

I have a system where actions of users need to be sent to other users who subscribe to those updates. There aren't a lot of users/subscribers at the moment, but it could grow rapidly so I want to make sure I get it right. Is it just this simple?
create table subscriptions (person_uuid uuid,
subscribes_person_uuid uuid,
primary key (person_uuid, subscribes_person_uuid)
)
I need to be able to look up things in both directions, i.e. answer the questions:
Who are Bob's subscribers.
Who does Bob subscribe to
Any ideas, feedback, suggestions would be useful.
Those two queries represent the start of your model:
you want the user to be the PK or part of the PK.
depending on the cardinality of subscriptions/subscribers you could go with:
for low numbers: using a single table and two sets
for high numbers: using 2 tables similar to the one you describe
#Jacob
Your use case looks very similar to the Twitter example, I did modelize it here
If you want to track both sides of relationship, I'll need to have a dedicated table to index them.
Last but not least, depending on the fact that the users are mutable OR not, you can decide to denormalize (e.g. duplicate User content) or just store user ids and then fetch users content in a separated table.
I've implemented simple join feature in Achilles. Have a look if you want to go this way

Resources