Most effecient way to retrive and save data to mongodb via mongoose - node.js

Lets say we have a collection of user documnets in the mongodb database and each document contains huge amount of fields and data.
We may want to save the email of the user to its document right? So first we need to return the documnet of the user and then save the email :
const account = db.Account.findOne({email}); account.email = value; account.save;
As I said the document has a lot of extra fields we don't want to modify we only want to select and modify the email feild, Now I wonder :
if this has any impact on performance and the duration of the request?
2)Do I need to only select the email field for purpose of enhancing the performance?
What is the best practice to do this?

Related

How to fetch all documents from a firebase collection where each document has some sub collection and in sub collection there is a document?

I am making an Admin dashboard. I want to show all user's details and their orders. When I want to fetch all documents inside the user collection its returning empty. For more In user collection, each document has some sub-collection. In the account sub-collection, there is a document exists with name details where user account details are available as shown in snapshots.
My code is
export function getUsers() {
return firebase.firestore().collection("users").get();
}
If you store user's details directly in the document instead of 'account' sub-collection then fetching "users" collection will return all users' documents with their data. If you say there's no reason then I'd recommend doing this.
Other option would be to use collectionGroup query on "account" which will fetch all the documents from sub-collections named as "account" i.e. giving you every user's account details.
const snap = await db.collectionGroup('account').get()
const users = snap.docs.map(d => ({id: doc.ref.parent.parent.id, data: d.data()))
Here, id is user's document ID.
Firestore queries only access a single collection, or all collections with a specific name. There is no way to query a collection based on values in another collection.
The most common options are:
Query the parent collection first, then check the subcollection for each document. This approach works best if you have relatively few false positives in the parent collection.
Query all child collections with a collection group query, then check the parent document for each result. This approach works best if you have relatively few false positive in your child collection query.
Replicate the relevant information from the child documents into the parent document, and then query the parent collection based on that. For example, you could add a hasOrders field or an orderCount in the user document. This approach always gives optimal results while querying, but requires that you modify the code that writes the data to accommodate.
The third approach is typically the best for a scalable solution. If you come from a background in relation databases, this sort of data duplication may seen unnatural, but it is actually very common in NoSQL databases where you often have to change your data model to allow the queries your app needs.
To learn more about this, I recommend reading NoSQL data modeling and watching Getting to know Cloud Firestore.

Dynamically accessing Mongodb database collection based in URL parameters

I have an API endpoint as api/get?subject=economics, and based on this subject parameter I access various database collections in mongodb. Right now I am using switch case statements to access the required database based on subject parameters. This is making my code very lengthy. Is there a way to access the database just by the subject parameter value? for example, instead of using this
const {subject}=req.query
switch (subject)
case "economics"
const data= await economics.find()
break;
I want to be able to use this
const {subject}=req.query
const data=await subject.find() // here subject will refrence its value like economics or stats
I think better way to pass modal as an argument. Here I is the code, How I have created common controller
https://gist.github.com/RMUSMAN/b7132fda6e945393882586c26b132e24
https://gist.github.com/RMUSMAN/c419d8149effb8514845946ad5b652f1
https://gist.github.com/RMUSMAN/f44ba2a20da35a2ce5ffd7517ea8fca8

Sub documents vs Mongoose population

I have the following senario:
A user can login to a website. A user can add/delete the poll(a question with two options). Any user can give there opinion on the poll by selecting anyone of the options.
Considering the above scenario I have three models - Users Polls Options . They are as follows, in order of dependency:
Option Schema
var optionSchema = new Schema({
optionName : {
type : String,
required : true,
},
optionCount : {
type : Number,
default : 0
}
});
Poll Schema
var pollSchema = new Schema({
question : {
type : String,
required : true
},
options : [optionSchema]
});
User Schema: parent schema
var usersSchema = new Schema({
username : {
type : String,
required : true
},
email : {
type : String,
required : true,
unique : true
},
password : String,
polls : [pollSchema]
});
How do I implement the above relation between those documents. What exaclty is mongoose population? How is it different from subdocuments ? Should I go for subdocuments or should I use Mongoose population.
As MongoDb hasn't got joins as relational databases, so population is a something like hidden join. It just means that when you have that User model and you will populate Poll Model, mongoose will do something like this:
fetch User
fetch related Polls, by ObjectIds which are stored in User document
put fetched Polls documents into User document
And when you will set User as document and Polls as subdocument, it will just mean that you will put whole data in single document. At one side it means that to fetch User Polls, mongoose doesn't need to run two queries(it need to fetch only User document, because Polls data is already there).
But what is better to choose? It just depends of the case.
If your Polls document will refer in another documents (you need access to Polls from documents User, A, B, C - it could be better to populate it, but not for sure. The advantage of populating is fact, that when you will need to change some Polls fields, you don't need to change that data in every document which is referring to that Polls document(as it will be a subdocument) - in that case in document User, A, B, C - you will only update Polls document. As you see it's nice. I told that it's not sure if populating will be better in that case, because I don't know how you need to retrieve your Polls data. If you store you data in wrong way, you will get performance issues or have some problems in easy data fetch.
Subdocuments are the basic way of storing data. It's great when Polls will be only referring to User. There is performance advantage - mongoose need to do one query instead of two as in population and there is no previously reminded update disadvantage, because you store Polls data only in single place, so there is no need to update other documents.
Basically MongoDb was created to mostly use Subdocuments. As the matter of fact, it's just non-relational database. So in most cases I prefer to use subdocuments. I can't answer which way will be better in your case, because I'm not sure how your DB looks like(in a full way) and how you want to retrieve your data.
There is some useful info in official documentation:
http://mongoosejs.com/docs/subdocs.html
http://mongoosejs.com/docs/populate.html
Take a look on that.
Edit
As I prefer to fetch data easily, take care about performance and know that data redundancy in MongoDb is something common, I will choose to store this data as subdocuments.

Mongodb: big data structure

I'm rebuilding my website which is a search engine for nicknames from the most active forum in France: you search for a nickname and you got all of its messages.
My current database contains more than 60Gb of data, stored in a MySQL database. I'm now rewriting it into a mongodb database, and after retrieving 1 million messages (1 message = 1 document) find() started to take a while.
The structure of a document is as such:
{
"_id" : ObjectId(),
"message": "<p>Hai guys</p>",
"pseudo" : "mahnickname", //from a nickname (*pseudo* in my db)
"ancre" : "774497928", //its id in the forum
"datepost" : "30/11/2015 20:57:44"
}
I set the id ancre as unique, so I don't get twice the same entry.
Then the user enters the nickname and it finds all documents that have that nickname.
Here is the request:
Model.find({pseudo: "danickname"}).sort('-datepost').skip((r_page -1) * 20).limit(20).exec(function(err, bears)...
Should I structure it differently? Instead of having one document for each message, I'm having a document for each nickname and I update the document once I get a new message from that nickname?
I was using the first method with MySQL et it wasn't taking that long.
Edit: Or maybe should I just index the nicknames (pseudo)?
Thanks!
Here are some recommendations for your problem about big data:
The ObjectId already contains a timestamp. You can also sort on it. You could save on some disk space by removing the datepost field.
Do you absolutely need the ancre field? The ObjectId is already unique and indexed. If you absolutely need it and need to keep the datepost seperate too, you could replace the _id field to be your ancre field.
As many mentioned, you should add an index on pseudo. This will make the "get all messages where the pseudo is mahnickname" search much faster.
If the amount of messages per user is low, you could store all of them inside a single Document per user. This would avoid having to skip to a specific page, which can be slow. However, be aware of the 16mb limit. I would personally still have them in multiple documents.
To keep fast query speeds, ensure that all your indexed fields fit in RAM. You can see the RAM consumption of indexed fields by typing db.collection.stats() and looking at the indexSizes sub-document.
Would there be a way for you to not skip documents, but use the time it got written to the database as your pages? If so, use the datepost field or the timestamp in _id for your paging strategy. If you decide on using the datepost, make a compound index on pseudo and datepost.
As for your benchmarks, you can closely monitor MongoDB by using mongotop and mongostat.

Manipulating ref'd mongo records based on _id field

Ok so I have a pretty simple DB setup in a MEAN app (node, mongoose, mongo) where I have Book records, and User records. A book has a single Owner, and can have any number of shared users which are stored in an array in a field called sharedWith:. Originally I was storing the user records with an email address as the _id field. I now realize this was a dumb move on my part because if someone wants to change their email address it effectively cuts them off from their books.
The app is not live yet, so it's not a fatal mistake.
My question is, once I revert the User documents to using the original hash value for _id, and store those in the Owner and sharedWith fields in the book documents, will I have to query each hash just to retrieve the actual usable user data?
I know mongoose has a .populate() method which will resolve sub documents, but as for inserting them? Will I POST the users as email addresses, then query each and store the resulting hashes? I can do this manually, but I wanted to make sure there is not some secret mongo-sauce that can do this in the DB itself.
Thanks!
If you have the _id available in the frontend for the user. You can directly share him a book by adding the _id to the sharedWith array of a book. But if you don't have the _id of the user available in the frontend, you need to manually get the _id by querying with the email and then store the _id in the sharedWith. As to retrieve the books, populate is indeed the best option to use to get user data.
And to get all books shared with a user you can do something like this,
Book.find({sharedWith:user1._id},function(err,docs){ });
This query can be made efficient if you use an index on sharedWith but that depends on your use case.

Resources