How to sort data in mongodb - best practice - node.js

I'm rather new to working with MongoDB.
In my application, the user can create to-do-lists. I save the data of these to-do-lists to my database using node.js with express framework and mongoose (with a ReactJS front-end), however, the user is supposed to be able to create several to-do-lists and I'm not sure about how to best sort the data of these lists so I can always access the correct data in my corresponding to-do-list.
Let's say I have this schema:
var TodoSchema = new mongoose.Schema({
task: String,
prio: String,
updated_at: { type: Date, default: Date.now },
});
module.exports = mongoose.model("Todo", TodoSchema);
for my database called tododb.
I was first planning on creating a new collection for each new list, but in this question ( how to create a new collection automatically in mongodb ) it says that it would be much better to create one collection for all lists, however, I'm not sure about how you would filter out the correct data in this case.
I imagine that I'm not the first person to encounter this problem, so how is it done usually? What other options do I have besides collections? And how would I access exactly the data that I need?
Edit: I was also thinking about just adding an element called "name" or something similar, where the user could enter a name for the list, and when fetching the data I would iterate over all data and filter out the once whose name matches, however, that seems terribly inefficient.

I'd model a todo list like the following:
{
"_id": "id of the todo list",
"name": "name of the todo list (e.g. daily tasks)",
"tasks" : [
{"name": "drink coffee", priority: 1, updated: "sometime" },
{"name": "write code", priority: 2, updated: "sometime" },
{"name": "drink tea", priority: 3, updated: "sometime" }
]
}
and then put them all in the same collection, if you need to split by user, just add a userId field to the todo list document.

Related

What is the most efficient way to perform CRUD operations to millions of documents in MongoDB

I am new to MongoDB and currently doing a project where MongoDB is my primary Database Management System. I am using Mongoose as the Object Data Modeling. Suppose, I have two collections called products and features. And each product may have multiple features that is a one-to-many relationship.
// products schema
const products = mongoose.Schema({
id: Number,
name: String,
description: String,
category: String,
price: Number,
}, {
strict: false,
});
// features schema
const features = mongoose.Schema({
id: Number,
product_id: Number,
feature: String,
value: String,
}, {
strict: false,
});
I have imported documents/records for both of the collections from external .csv files and the number of records for both collections are more than 3 million. My client-side application requires data about a particular product with all the features in it like below:
{
productId: 3,
name: 'Denim Jeans',
description: '',
category: 'Cloth',
price: 40.00,
features: [
{
feature: 'Material',
value: 'Cotton',
},
{
feature: 'color',
value: 'blue',
},
....
]
}
Each product will not have more than 5-6 features. So, what I wanted to do is to embed the features document as a subdocument in the products document like the response above. So, I wrote a piece of code like this. It's not the exact same code as I deleted it from my code when it was not working but the logic is the same.
db.products.find({}, (err, product) => {
// product -> array of all documents from products collection
// for each product, I am trying to find the corresponding feature from
// the features collections and embed it to each product document
product.forEach(item => {
db.features.find({product_id: item.id}, (err, feature) => {
// feature -> array of all the features of a product
// embed the feature array to each individual product item
item.features = feature;
})
})
})
Now, the issue is when I run the above piece of code, I got errors like OutOfMemory as it is trying to read from millions of records and my memory is not capable of holding all of this. My question is what is the best way to retrieve all the products and for each individual product write a query to get its corresponding features and embed it inside each product document.
I have a couple of ideas. Kindly correct me if I am wrong. Instead of storing all the products and their features in memory, I want to store them on the disk and update the individual product using the Bulk API of MongoDB. But in that case, how to achieve this and I am concerned about the performance. What is the best practice to follow in this case? Or, should I keep them in a separate collection and from the application server make two queries and package the response there? Or, should I use any kind of aggregation pipeline on the database level? Thanks in advance.

Adding value to already declared MongoDB object with mongoose Schema

I am new to MongoDB and mongoose. I am trying to create a Node & MongoDB auction app. So since it is actually an online auction, users should be able to bid for items. I successfully completed the user registration, sign in page and authentication process, however, I am a bit stuck in the bidding page.
I created a Schema using mongoose and each item for auction is saved in the database. I want to add name and price of each user who bid for the item in the same object in MongoDB, like this:
{
name: "valuable vase from 1700s",
owner: "John Doe",
itemId: 100029,
bids: {
100032: 30000,
100084: 34000
}
}
So each user will have ids like 100032: 30000, and when they bid, their "account id: price" will be added under bids in the database object of the item.
I made some research and found some ways to solve the problem but I want to know if what I want to do is possible and if it is the right solution to do.
Thanks for giving me your time!
There are indeed couple of ways to achieve what you want.
In my opinion, a collection called ItemBids, where each document includes this data structure, will benefit you the most.
{
itemId: ObjectId # reference to the item document
accountId: ObjectId # reference to the account
bid: Number # the bid value
}
This pattern is suitable for your case because you can easily query the bids by whatever you want -
You can get all the account bids, you can get all the item bids, and you can sort them with native Mongo by the bid price.
Every time there's a bid, you insert a new document to this collection.
Another option is embedding an array of Bids objects in the item Object.
Each Bid object should include:
bids: [{
account: ObjectId("") # This is the account
price: Number
}]
The cons here are that querying it and accessing it will require more complex queries.
You can read more about the considerations
here:
https://docs.mongodb.com/manual/core/data-model-design
https://coderwall.com/p/px3c7g/mongodb-schema-design-embedded-vs-references
The way you decided to implement your functionality is a little bit complicated.
It is not impossible to do that but, the better way is to use array of objects instead of a single object like this:
{
name: '',
..
..
bids: [{
user: 100032,
price: 30000
}, {
user: 100084,
price: 34000
}]
}

Create View from multiple collections MongoDB

I have following Mongo Schemas(truncated to hide project sensitive information) from a Healthcare project.
let PatientSchema = mongoose.Schema({_id:String})
let PrescriptionSchema = mongoose.Schema({_id:String, patient: { type: Number, ref: 'Patient', createdAt:Date }})
let ReportSchema = mongoose.Schema({_id:String, patient: { type: Number, ref: 'Patient', createdAt:Date }})
let EventsSchema = mongoose.Schema({_id:String, patient: { type: Number, ref: 'Patient', createdAt:Date }})
There is ui screen from the mobile and web app called Health history, where I need to paginate the entries from prescription, reports and events sorted based on createAt. So I am building a REST end point to get this heterogeneous data. How do I achieve this. Is it possible to create a "View" from multiple schema models so that I won't load the contents of all 3 schema to fetch one page of entries. The schema of my "View" should look like below so that I can run additional queries on it (e.g. find last report)
{recordType:String,/* prescription/report/event */, createdDate:Date, data:Object/* content from any of the 3 tables*/}
I can think of three ways to do this.
Imho the easiest way to achieve this is by using an aggregation something like this:
db.Patients.aggregate([
{$match : {_id: <somePatientId>},
{
$lookup:
{
from: Prescription, // replicate this for Report and Event,
localField: _id,
foreignField: patient,
as: prescriptions // or reports or events,
}
},
{ $unwind: prescriptions }, // or reports or events
{ $sort:{ $createDate : -1}},
{ $skip: <positive integer> },
{ $limit: <positive integer> },
])
You'll have to adapt it further, to also get the correct createdDate. For this, you might want to look at the $replaceRoot operator.
The second option is to create a new "meta"-collection, that holds your actual list of events, but only holds a reference to your patient as well as the actual event using a refPath to handle the three different event types. This solution is the most elegant, because it makes querying your data way easier, and probably also more performant. Still, it requires you to create and handle another collection, which is why I didn't want to recommend this as the main solution, since I don't know if you can create a new collection.
As a last option, you could create virtual populate fields in Patient, that automatically fetch all prescriptions, reports and events. This has the disadvantage that you can not really sort and paginate properly...

Mongoose: How to populate 2 level deep population without populating fields of first level? in mongodb

Here is my Mongoose Schema:
var SchemaA = new Schema({
field1: String,
.......
fieldB : { type: Schema.Types.ObjectId, ref: 'SchemaB' }
});
var SchemaB = new Schema({
field1: String,
.......
fieldC : { type: Schema.Types.ObjectId, ref: 'SchemaC' }
});
var SchemaC = new Schema({
field1: String,
.......
.......
.......
});
While i access schemaA using find query, i want to have fields/property
of SchemaA along with SchemaB and SchemaC in the same way as we apply join operation in SQL database.
This is my approach:
SchemaA.find({})
.populate('fieldB')
.exec(function (err, result){
SchemaB.populate(result.fieldC,{path:'fieldB'},function(err, result){
.............................
});
});
The above code is working perfectly, but the problem is:
I want to have information/properties/fields of SchemaC through SchemaA, and i don't want to populate fields/properties of SchemaB.
The reason for not wanting to get the properties of SchemaB is, extra population will slows the query unnecessary.
Long story short:
I want to populate SchemaC through SchemaA without populating SchemaB.
Can you please suggest any way/approach?
As an avid mongodb fan, I suggest you use a relational database for highly relational data - that's what it's built for. You are losing all the benefits of mongodb when you have to perform 3+ queries to get a single object.
Buuuuuut, I know that comment will fall on deaf ears. Your best bet is to be as conscious as you can about performance. Your first step is to limit the fields to the minimum required. This is just good practice even with basic queries and any database engine - only get the fields you need (eg. SELECT * FROM === bad... just stop doing it!). You can also try doing lean queries to help save a lot of post-processing work mongoose does with the data. I didn't test this, but it should work...
SchemaA.find({}, 'field1 fieldB', { lean: true })
.populate({
name: 'fieldB',
select: 'fieldC',
options: { lean: true }
}).exec(function (err, result) {
// not sure how you are populating "result" in your example, as it should be an array,
// but you said your code works... so I'll let you figure out what goes here.
});
Also, a very "mongo" way of doing what you want is to save a reference in SchemaC back to SchemaA. When I say "mongo" way of doing it, you have to break away from your years of thinking about relational data queries. Do whatever it takes to perform fewer queries on the database, even if it requires two-way references and/or data duplication.
For example, if I had a Book schema and Author schema, I would likely save the authors first and last name in the Books collection, along with an _id reference to the full profile in the Authors collection. That way I can load my Books in a single query, still display the author's name, and then generate a hyperlink to the author's profile: /author/{_id}. This is known as "data denormalization", and it has been known to give people heartburn. I try and use it on data that doesn't change very often - like people's names. In the occasion that a name does change, it's trivial to write a function to update all the names in multiple places.
SchemaA.find({})
.populate({
path: "fieldB",
populate:{path:"fieldC"}
}).exec(function (err, result) {
//this is how you can get all key value pair of SchemaA, SchemaB and SchemaC
//example: result.fieldB.fieldC._id(key of SchemaC)
});
why not add a ref to SchemaC on SchemaA? there will be no way to bridge to SchemaC from SchemaA if there is no SchemaB the way you currently have it unless you populate SchemaB with no other data than a ref to SchemaC
As explained in the docs under Field Selection, you can restrict what fields are returned.
.populate('fieldB') becomes populate('fieldB', 'fieldC -_id'). The -_id is required to omit the _id field just like when using select().
I think this is not possible.Because,when a document in A referring a document in B and that document is referring another document in C, how can document in A know which document to refer from C without any help from B.

Mongoose: populate() / DBref or data duplication?

I have two collections:
Users
Uploads
Each upload has a User associated with it and I need to know their details when an Upload is viewed. Is it best practice to duplicate this data inside the the Uploads record, or use populate() to pull in these details from the Users collection referenced by _id?
OPTION 1
var UploadSchema = new Schema({
_id: { type: Schema.ObjectId },
_user: { type: Schema.ObjectId, ref: 'users'},
title: { type: String },
});
OPTION 2
var UploadSchema = new Schema({
_id: { type: Schema.ObjectId },
user: {
name: { type: String },
email: { type: String },
avatar: { type: String },
//...etc
},
title: { type: String },
});
With 'Option 2' if any of the data in the Users collection changes I will have to update this across all associated Upload records. With 'Option 1' on the other hand I can just chill out and let populate() ensure the latest User data is always shown.
Is the overhead of using populate() significant? What is the best practice in this common scenario?
If You need to query on your Users, keep users alone. If You need to query on your uploads, keep uploads alone.
Another question you should ask yourself is: Every time i need this data, do I need the embedded objects (and vice-versa)? How many time this data will be updated? How many times this data will be read?
Think about a friendship request:
Each time you need the request you need the user which made the request, then embed the request inside the user document.
You will be able to create an index on the embedded object too, and your search will be mono query / fast / consistent.
Just a link to my previous reply on a similar question:
Mongo DB relations between objects
I think this post will be right for you http://www.mongodb.org/display/DOCS/Schema+Design
Use Cases
Customer / Order / Order Line-Item
Orders should be a collection. customers a collection. line-items should be an array of line-items embedded in the order object.
Blogging system.
Posts should be a collection. post author might be a separate collection, or simply a field within posts if only an email address. comments should be embedded objects within a post for performance.
Schema Design Basics
Kyle Banker, 10gen
http://www.10gen.com/presentation/mongosf2011/schemabasics
Indexing & Query Optimization
Alvin Richards, Senior Director of Enterprise Engineering
http://www.10gen.com/presentation/mongosf-2011/mongodb-indexing-query-optimization
**These 2 videos are the bests on mongoddb ever seen imho*
Populate() is just a query. So the overhead is whatever the query is, which is a find() on your model.
Also, best practice for MongoDB is to embed what you can. It will result in a faster query. It sounds like you'd be duplicating a ton of data though, which puts relations(linking) at a good spot.
"Linking" is just putting an ObjectId in a field from another model.
Here is the Mongo Best Practices http://www.mongodb.org/display/DOCS/Schema+Design#SchemaDesign-SummaryofBestPractices
Linking/DBRefs http://www.mongodb.org/display/DOCS/Database+References#DatabaseReferences-SimpleDirect%2FManualLinking

Resources