Mongodb transactions scalability WriteConflict - node.js

I am using:
Mongodb (v4.2.10)
nodejs + Mongoose
I am still in the development phase of my application and I am facing a potential problem (WriteConflict) using transactions in mongodb.
My application gives the possibilities for users to add posts and to like posts.
When liking a post, here what is happening on the back-end side :
Start transaction
Find the post with the given ID in the Post collection
Add a document to the Like collection
Update the like count of the post in the Post collection
Update the likes given count of the user in the User collection
Update the likes received count of the user in the User collection
Add a notification document in the Notification collection
Commit the transaction
I'd say the average execution time is 1 second, so it means that for 1 second, I am locking :
1 Post document
2 User documents
I can see that as huge scalability problem, especially if a user has many popular posts that will often be liked by others.
What would you recommend me to do?
I don't think I can stop using transactions because if something goes bad during the execution of the function, I want to revert potential changes made to the DB
Thanks

Transactions are not required for this.
Once a like is in the likes collection, you can recalculate all counts.
Notifications cannot be calculated (since you don't know when one was sent), but generally they are ephemeral anyway and if you have an outage requiring database restore users will most likely forgive some lost notifications.
When updating counts, use $inc instead of read-modify-write (writing out the new value).

Related

How to Prevent Concurrent Querying While Series of Transactions is Being Executed in MongoDB?

I have been trying to solve an issue, which is to prevent another query from being served until a series of transaction have been completed.
I am thinking that when a user fires two or more simultaneous request to Node server, it might cause some issues when read/write to MongoDB, thus posing a security issue. The pseudocode looks like this:
//When user buy something, check if the user has sufficient balance, then send a deliver order and deduct the balance.
app.post('/buy', (req, res) => {
//Step 1: Get balance from MongoDB with mongoose
balance.find(...)
// Step 2: Check if balance sufficient, issue a delivery order
if (balance >= price){ //run delivery order code to deliver item }
// Step 3: Deduct balance, then write new balance back to Mongo database
balance = balance - price;
balance.findOneAndUpdate(...)
})
Here lies a problem. If the user simultaneously fire two or more requests, each might have a chance to read the database. Because of this, they pass the balance check, and their actions will be succesfully completed with the delivery orders. If the user only have enough balance to buy one item, it will cause a 'double spending' problem because the user will have successfully bought more than once.
My question for this situation is: How to prevent the next query (which will be within millisecond after the previous query) from being run at Step 1, until all the transactions have been completed for the previous query (until Step 3 is finished)?
MongoDB documentation mentioned something like concurrency and locking, however it is not stated how to use them in a series of transaction, as above. It is also not clear if the so called 'multi-document transaction' is applicable in this situation, also lacking the code to show how to use them. Stack Overflow has only a few related questions, but the solutions are a bit vague, and almost all don't have a solution code as a reference.
MongoDB does implement transactions as of 4.0, so you should use them if you are looking for transactional behavior across multiple operations.
The linked page also provides examples in various languages.

MongoDB unnormalized data and the change stream

I have an application that most of the collections in it are heavily read then write, so I demoralized the data in them, and now I need to handle the normalization of the data, for some collections I used jobs in order to sync the data but that not good enough as for some cases I need the data to be normalized in real-time,
for example:
let's say I have orders collections and users collection.
orders have the user email(for search)
{
_id:ObjectId(),
user_email:'test#email.email'
....
}
now whenever I am changing the user email in users I want to change it in orders as well.
so I find that MongoDB has change stream which looks pretty awesome feature, I have played with it a bit and it gives me the results I need to update my other collections, my question is does anyone use it in production? can I trust on this stream to be always set the update data to update the other collections? how does it affect the DB performance if I have many streams open? also, I use the nodejs MongoDB driver does it has any effect
I've not worked yet with change stream but these cases are very common and can be easily solved by building more normalized schema
Normalization form 1 says among the others "don't repeat data" - so you will save the email in the users collection only
orders collection won't have the email field but will have user_id for joining with users collection with lookup command for joining collections
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/

If a multi update fails partially in mongodb, how to roll back?

I understand that a multi update will not be atomic and there are chances that the update query may fail during the process. The error can be found via getLastError. But what if I want to roll back the updates that have already been done in that particular query? Any method simpler than the tedious two phase commits?
For instance, let me say I have a simple collection of some users and their phone models. Now I do a multi update on all the users who have a Nexus 4 to Nexus 5 (Dreamy, isn't it?). The only condition being, all or none - so all the N5s are taken back if even one N4 user doesn't get his. Now, somehow the mongodb fails in between, and I am stuck with a few users with N4 and a few with N5. Now, as I have gathered from the net is that I can't have mongo roll back directly. If the operation failed, I will have to do a manual update of the N5 users back to N4. And if that fails too somehow, keep repeating it.
Or when I have a complicated collection, I will have to keep a new key viz. Status and update it with keywords updating/ updated.
This is what I understand. I wanted to know if there is any simpler way. I assume from the comments the answer is a big no.

How to account for a failed write or add process in Mongodb

So I've been trying to wrap my head around this one for weeks, but I just can't seem to figure it out. So MongoDB isn't equipped to deal with rollbacks as we typically understand them (i.e. when a client adds information to the database, like a username for example, but quits in the middle of the registration process. Now the DB is left with some "hanging" information that isn't assocaited with anything. How can MongoDb handle that? Or if no one can answer that question, maybe they can point me to a source/example that can? Thanks.
MongoDB does not support transactions, you can't perform atomic multistatement transactions to ensure consistency. You can only perform an atomic operation on a single collection at a time. When dealing with NoSQL databases you need to validate your data as much as you can, they seldom complain about something. There are some workarounds or patterns to achieve SQL like transactions. For example, in your case, you can store user's information in a temporary collection, check data validity, and store it to user's collection afterwards.
This should be straight forwards, but things get more complicated when we deal with multiple documents. In this case, you need create a designated collection for transactions. For instance,
transaction collection
{
id: ..,
state : "new_transaction",
value1 : values From document_1 before updating document_1,
value2 : values From document_2 before updating document_2
}
// update document 1
// update document 2
Ooohh!! something went wrong while updating document 1 or 2? No worries, we can still restore the old values from the transaction collection.
This pattern is known as compensation to mimic the transactional behavior of SQL.

Processing a stream in Node where action depends on asynchronous calls

I am trying to write a node program that takes a stream of data (using xml-stream) and consolidates it and writes it to a database (using mongoose). I am having problems figuring out how to do the consolidation, since the data may not have hit the database by the time I am processing the next record. I am trying to do something like:
on order data being read from stream
look to see if customer exists on mongodb collection
if customer exists
add the order to the document
else
create the customer record with just this order
save the customer
My problem is that two 'nearby' orders for a customer cause duplicate customer records to be written, since the first one hasn't been written before the second one checks to see if it there.
In theory I think I could get around the problem by pausing the xml-stream, but there is a bug preventing me from doing this.
Not sure that this is the best option, but using async queue was what I ended up doing.
At the same time as I was doing that a pull request for xml-stream (which is what I was using to process the stream) that allowed pausing was added.
Is there a unique field on the customer object in the data coming from the stream? You could add a unique restriction to your mongoose schema to prevent duplicates at the database level.
When creating new customers, add some fallback logic to handle the case where you try to create a customer but that same customer is created by another save at the same. When this happens try the save again but first fetch the other customer first and add the order to the fetched customer document

Resources