MongoDB unnormalized data and the change stream - node.js

I have an application that most of the collections in it are heavily read then write, so I demoralized the data in them, and now I need to handle the normalization of the data, for some collections I used jobs in order to sync the data but that not good enough as for some cases I need the data to be normalized in real-time,
for example:
let's say I have orders collections and users collection.
orders have the user email(for search)
{
_id:ObjectId(),
user_email:'test#email.email'
....
}
now whenever I am changing the user email in users I want to change it in orders as well.
so I find that MongoDB has change stream which looks pretty awesome feature, I have played with it a bit and it gives me the results I need to update my other collections, my question is does anyone use it in production? can I trust on this stream to be always set the update data to update the other collections? how does it affect the DB performance if I have many streams open? also, I use the nodejs MongoDB driver does it has any effect

I've not worked yet with change stream but these cases are very common and can be easily solved by building more normalized schema
Normalization form 1 says among the others "don't repeat data" - so you will save the email in the users collection only
orders collection won't have the email field but will have user_id for joining with users collection with lookup command for joining collections
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/

Related

Geting a field from a large number of firebase documents without it being to costly

I'm working on a project that will have a large number (thousands, possibly millions) of documents on a firebase collection, I need to access the average of the value by day of documents that are the same type, each one of them has a field "registered_value", "date" and a "code" to identify its value and type and registered date.
I need to show users the average value by day of the documents that have the same code.
Users can add new documents, edit existing ones or delete the ones created by them
Since I need to get this data frequently it will be too expensive to always read the entire collection every time a user loads the pages that display this info is there a better way store or get the avarege?
I'm working with ReactJS and Node.js
There's nothing built into Firestore to get aggregated values like that. The common approach is to store the aggregated value in a separate document somewhere, and update that document upon every relevant write operation. You can do this either from client-side code, or from Cloud Functions.
For more on this, see the Firebase documentation on aggregation queries and on distributed counters.

Mongodb transactions scalability WriteConflict

I am using:
Mongodb (v4.2.10)
nodejs + Mongoose
I am still in the development phase of my application and I am facing a potential problem (WriteConflict) using transactions in mongodb.
My application gives the possibilities for users to add posts and to like posts.
When liking a post, here what is happening on the back-end side :
Start transaction
Find the post with the given ID in the Post collection
Add a document to the Like collection
Update the like count of the post in the Post collection
Update the likes given count of the user in the User collection
Update the likes received count of the user in the User collection
Add a notification document in the Notification collection
Commit the transaction
I'd say the average execution time is 1 second, so it means that for 1 second, I am locking :
1 Post document
2 User documents
I can see that as huge scalability problem, especially if a user has many popular posts that will often be liked by others.
What would you recommend me to do?
I don't think I can stop using transactions because if something goes bad during the execution of the function, I want to revert potential changes made to the DB
Thanks
Transactions are not required for this.
Once a like is in the likes collection, you can recalculate all counts.
Notifications cannot be calculated (since you don't know when one was sent), but generally they are ephemeral anyway and if you have an outage requiring database restore users will most likely forgive some lost notifications.
When updating counts, use $inc instead of read-modify-write (writing out the new value).

Are MongoDB queries client-side operations?

Lets say I have a document
{ "_id" : ObjectId("544946347db27ca99e20a95f"), "nameArray": [{"id":1 , first_name: "foo"}]
Now i need to push a array into nameArray using $push . How does document update in that case. Does document get's retrieved on client and updates happens on client and changes are then reflected to Mongodb database server. Entire operation is carried out in Mongodb Database.
What you are asking here is if MongoDB operations are client-side operations. The short answer is NO.
In MongoDB a query targets a specific collection of documents as mentioned in the documentation and a collection is a group of MongoDB documents which exists within a single database. Collections are simply what tables are in RDBMS. So if query targets a specific collection then it means their are perform on database level, thus server-side. The same thing applies for data modification and aggregation operations.
Sometimes, your operations may involve a client-side processing because MongoDB doesn't provides a way to achieve what you want out of the box. Generally speaking, you only those type of processing when you want to modify your documents structure in the collection or change your fields' type. In such situation, you will need to retrieve your documents, perform your modification using bulk operations.
See the documentation:
Your array is inserted into the existing array as one element. If the array does not exists it is created. If the target is not an array the operation fails.
There is nothing stated like "retriving the element to the client and update it there". So the operation is completely done on the database server side. I don't know any operation that works in the way like you described it. Unless you are chaining a query, with a modify of the item in your client and an update. But these are two separated operations and not one single command.

How to account for a failed write or add process in Mongodb

So I've been trying to wrap my head around this one for weeks, but I just can't seem to figure it out. So MongoDB isn't equipped to deal with rollbacks as we typically understand them (i.e. when a client adds information to the database, like a username for example, but quits in the middle of the registration process. Now the DB is left with some "hanging" information that isn't assocaited with anything. How can MongoDb handle that? Or if no one can answer that question, maybe they can point me to a source/example that can? Thanks.
MongoDB does not support transactions, you can't perform atomic multistatement transactions to ensure consistency. You can only perform an atomic operation on a single collection at a time. When dealing with NoSQL databases you need to validate your data as much as you can, they seldom complain about something. There are some workarounds or patterns to achieve SQL like transactions. For example, in your case, you can store user's information in a temporary collection, check data validity, and store it to user's collection afterwards.
This should be straight forwards, but things get more complicated when we deal with multiple documents. In this case, you need create a designated collection for transactions. For instance,
transaction collection
{
id: ..,
state : "new_transaction",
value1 : values From document_1 before updating document_1,
value2 : values From document_2 before updating document_2
}
// update document 1
// update document 2
Ooohh!! something went wrong while updating document 1 or 2? No worries, we can still restore the old values from the transaction collection.
This pattern is known as compensation to mimic the transactional behavior of SQL.

MongoDB merge one field into existing collection with Map/Reduce

I have a MongoDB database with 2 collections:
groups: { group_slug, members }
users: { id, display name, groups }
All changes to groups are done by changing the members array of the group to include the users ids.
I want to sync these changes across to the users collection by using map/reduce. How can I output the results of map/reduce into an existing collection (but not merging or reducing).
My existing code is here: https://gist.github.com/morgante/5430907
How can I output the results of map/reduce into an existing collection
You really can't do it this way. Nor is this really suggested behaviour. There are other solutions:
Solution #1:
Output the map / reduce into a temporary collection
Run a follow-up task that updates the primary data store from the temporary collection
Clean-up the temporary collection
Honestly, this is a safe way to do this. You can implement some basic retry logic in the whole loop.
Solution #2:
Put the change on a Queue. (i.e. "user subscribes to group")
Update both tables from separates workers that are listening for such events on the queue.
This solution may require a separate piece (the queue), but any large system is going to have such denormalization problems. So this will not be the only place you see this.

Resources