MongoDb multi document operation with a single document lock

MongoDb multi document operation with a single document lock - node.js

For example< we have a next db structure:
Users
Items
Games
Items belongs to Users (userId field in item) aka inventory.
Users can create games.
Users can join created games. Only two users allowed to join a game.
When game is completed, all items from users inventories should be transferred to "winner" inventory.
I want to build a flow, by using NodeJs, to handle "joinGame", "winGame" cases.
In a MySql database i used transactions to lock "game" record and perform all operations in a single transaction to ensure data consistency.
I have no idea how to make it in mongo in a correct way.
Digging on internet, i found two solution:
Write a worker, what cycles through a "game" records to finalize them.
Use app-level (node executable) lock
?
Thank you in advance!

Related

MongoDB unnormalized data and the change stream

I have an application that most of the collections in it are heavily read then write, so I demoralized the data in them, and now I need to handle the normalization of the data, for some collections I used jobs in order to sync the data but that not good enough as for some cases I need the data to be normalized in real-time,
for example:
let's say I have orders collections and users collection.
orders have the user email(for search)
{
_id:ObjectId(),
user_email:'test#email.email'
....
}
now whenever I am changing the user email in users I want to change it in orders as well.
so I find that MongoDB has change stream which looks pretty awesome feature, I have played with it a bit and it gives me the results I need to update my other collections, my question is does anyone use it in production? can I trust on this stream to be always set the update data to update the other collections? how does it affect the DB performance if I have many streams open? also, I use the nodejs MongoDB driver does it has any effect

I've not worked yet with change stream but these cases are very common and can be easily solved by building more normalized schema
Normalization form 1 says among the others "don't repeat data" - so you will save the email in the users collection only
orders collection won't have the email field but will have user_id for joining with users collection with lookup command for joining collections
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/

Implement Paging on Azure Cosmos Db Data coming from two seperate documents

We have two separate set of documents in Cosmos Db, one storing User and it's various roles and second set of documents storing the permission to a particular job.
Now, the job list is unbounded and can grow substantially over a period of time. As group by is not allowed on multiple documents, we are trying to figure out the best strategy to implement a way on retrieving all users either based on role or particular job.
1) Solution 1 - Keep User data and job data as sub documents in a big long document and helps with querying and even continuation tokens.
2) Solution 2 - Keep user and role data in 1 documents and multiple job documents and query on the client side separately and perform query there. In this case the continuation token support is lost, as you have to query the complete data first to provide any meaningful results.
3) Solution 3 - Keep the role data with each job document and directly query on it. In this case, we will get number of users based on job and then make single query/user to get their information.
Can anyone recommend a better solution or pick from above 3 and suggest a path forward?

It seems that you need extra storage to store the relationship. We could use Azure SQL to store the relationship of user(documentId, userid, role id), role,job. Then store the incertain property info such as useinfo into Documentdb.

Rebuild queries from domain events by multiple aggregates

I'm using a DDD/CQRS/ES approach and I have some questions about modeling my aggregate(s) and queries. As an example consider the following scenario:
A User can create a WorkItem, change its title and associate other users to it. A WorkItem has participants (associated users) and a participant can add Actions to a WorkItem. Participants can execute Actions.
Let's just assume that Users are already created and I only need userIds.
I have the following WorkItem commands:
CreateWorkItem
ChangeTitle
AddParticipant
AddAction
ExecuteAction
These commands must be idempotent, so I cant add twice the same user or action.
And the following query:
WorkItemDetails (all info for a work item)
Queries are updated by handlers that handle domain events raised by WorkItem aggregate(s) (after they're persisted in the EventStore). All these events contain the WorkItemId. I would like to be able to rebuild the queries on the fly, if needed, by loading all the relevant events and processing them in sequence. This is because my users usually won't access WorkItems created one year ago, so I don't need to have these queries processed. So when I fetch a query that doesn't exist, I could rebuild it and store it in a key/value store with a TTL.
Domain events have an aggregateId (used as the event streamId and shard key) and a sequenceId (used as the eventId within an event stream).
So my first attempt was to create a large Aggregate called WorkItem that had a collection of participants and a collection of actions. Participant and Actions are entities that live only within a WorkItem. A participant references a userId and an action references a participantId. They can have more information, but it's not relevant for this exercise. With this solution my large WorkItem aggregate can ensure that the commands are idempotent because I can validate that I don't add duplicate participants or actions, and if I want to rebuild the WorkItemDetails query, I just load/process all the events for a given WorkItemId.
This works fine because since I only have one aggregate, the WorkItemId can be the aggregateId, so when I rebuild the query I just load all events for a given WorkItemId.
However, this solution has the performance issues of a large Aggregate (why load all participants and actions to process a ChangeTitle command?).
So my next attempt is to have different aggregates, all with the same WorkItemId as a property but only the WorkItem aggregate has it as an aggregateId. This fixes the performance issues, I can update the query because all events contain the WorkItemId but now my problem is that I can't rebuild it from scratch because I don't know the aggregateIds for the other aggregates, so I can't load their event streams and process them. They have a WorkItemId property but that's not their real aggregateId. Also I can't guarantee that I process events sequentially, because each aggregate will have its own event stream, but I'm not sure if that's a real problem.
Another solution I can think of is to have a dedicated event stream to consolidate all WorkItem events raised by the multiple aggregates. So I could have event handlers that simply append the events fired by the Participant and Actions to an event stream whose id would be something like "{workItemId}:allevents". This would be used only to rebuild the WorkItemDetails query. This sounds like an hack.. basically I'm creating an "aggregate" that has no business operations.
What other solutions do I have? Is it uncommon to rebuild queries on the fly? Can it be done when events for multiple aggregates (multiple event streams) are used to build the same query? I've searched for this scenario and haven't found anything useful. I feel like I'm missing something that should be very obvious, but I haven't figured what.
Any help on this is very much appreciated.
Thanks

I don't think you should design your aggregates with querying concerns in mind. The Read side is here for that.
On the domain side, focus on consistency concerns (how small can the aggregate be and the domain still remain consistent in a single transaction), concurrency (how big can it be and not suffer concurrent access problems / race conditions ?) and performance (would we load thousands of objects in memory just to perform a simple command ? -- exactly what you were asking).
I don't see anything wrong with on-demand read models. It's basically the same as reading from a live stream, except you re-create the stream when you need it. However, this might be quite a lot of work for not an extraordinary gain, because most of the time, entities are queried just after they are modified. If on-demand becomes "basically every time the entity changes", you might as well subscribe to live changes. As for "old" views, the definition of "old" is that they are not modified any more, so they don't need to be recalculated anyways, regardless of if you have an on-demand or continuous system.
If you go the multiple small aggregates route and your Read Model needs information from several sources to update itself, you have a couple of options :
Enrich emitted events with additional data
Read from multiple event streams and consolidate their data to build the read model. No magic here, the Read side needs to know which aggregates are involved in a particular projection. You could also query other Read Models if you know they are up-to-date and will give you just the data you need.
See CQRS events do not contain details needed for updating read model

MongoDB merge one field into existing collection with Map/Reduce

I have a MongoDB database with 2 collections:
groups: { group_slug, members }
users: { id, display name, groups }
All changes to groups are done by changing the members array of the group to include the users ids.
I want to sync these changes across to the users collection by using map/reduce. How can I output the results of map/reduce into an existing collection (but not merging or reducing).
My existing code is here: https://gist.github.com/morgante/5430907

How can I output the results of map/reduce into an existing collection
You really can't do it this way. Nor is this really suggested behaviour. There are other solutions:
Solution #1:
Output the map / reduce into a temporary collection
Run a follow-up task that updates the primary data store from the temporary collection
Clean-up the temporary collection
Honestly, this is a safe way to do this. You can implement some basic retry logic in the whole loop.
Solution #2:
Put the change on a Queue. (i.e. "user subscribes to group")
Update both tables from separates workers that are listening for such events on the queue.
This solution may require a separate piece (the queue), but any large system is going to have such denormalization problems. So this will not be the only place you see this.

Cassandra design pattern for shared record (m:n)

we have two entities User and Role. One User can have multiple Roles, and single Role can be shared by many users -
typical m:n relation.
Roles are also dynamic and we expect large amount (millions).
It is quiet simple to model such data in relational DB. I would like to find out whenever it would be possible in cassandra.
Currently I see two solutions:
A) Use normalized model and create something similar to inner-join
Create each single role in separate CF and store in User record foreign keys to referenced roles.
pro: Roles are not replicated and maintenance is simple
contra: In order to get all Roles for single User multiple network calls are necessary. User record contains only FK, Roles are stored
using random partitioner, in this case each role could be stored on different cassandra node.
B) Denormalize model and replicate roles to avoid round trips
In this scenario User record in cassandra contains all user roles as copy.
pro: It is possible to read User with all roles within single query. This guarantees short load times.
contra: Each shared Role is copied multiple times - on each related User. Maintaining roles is very difficult, especially if we have
large data amount. For example: one Role is shared by 1000 users. Changes on this Role require update on 1000 User records.
For very large data sets such updates has to be executed as asynchronous job.
Solutions above are very limited, meybie Cassandra is not right solution for m:n relations ? Do you know any cassandra design patter for such problem?
Thanks,
Maciej

The way you want to design a data store in Cassandra is to start with the queries you plan to execute and make it so you can get all the information you need at once. Denormalization is the name of the game here; if you're not replicating that role information in each user node, you're not going to avoid disk seeks, and your read performance will suffer. Joins do not make sense; if you want a relational database, use a relational database.
At a guess, you're going to ask a lot of questions about what roles a user has and what they should be doing with them, so you definitely want to have role information duplicated in each user entry - probably with each role getting its own column (role-ROLE_KEY => serialized-capability-info instead of roles => [serialized array of capability info]). Your application will need some way to iterate over all those columns itself.
You will probably want to look at what users are in a role, and so you should probably store all the user information you'll need for that view in the role column family as well (though a subset of the full user record will do).
When you run updates, and add/remove users from roles, you will need to make sure that you update both the role's list of users and the user's roles at the same time. Because you're using a column for each relation, instead of a single shared serialized blob, this should work even if you're editing two different roles that share the same user at the same time: Cassandra can merge the updates, including the deletes.
If the query needs to be asynchronous, then go make your application handle it. Remember that Cassandra is an eventual-consistency data store and you shouldn't expect updates to be visible everywhere immediately anyway.

Another option these days is to use playORM that can do joins for you ;). You just decide how to partition your data. It uses Scalabla JQL which is a simple addition on JQL as follows
#NoSqlQuery(name="findJoinOnNullPartition", query="PARTITIONS t('account', :partId) select t FROM Trade as t INNER JOIN t.security as s where s.securityType = :type and t.numShares = :shares")
So, we can finally normalize our data on a noSQL system AND scale at the same time. We don't need to give up normalization which has certain benefits.
Dean

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string