mongodb create index or multiple collection? - node.js

I have a collection that will insert a few million documents every year. My collection looks like this (using mongoose):
var mongoose = require("mongoose");
var Schema = mongoose.Schema;
var MySchema = new Schema({
schoolID: {
type: mongoose.Schema.Types.ObjectId, ref: 'School'
},
kelasID: {
type: mongoose.Schema.Types.ObjectId, ref: 'Kelas'
},
studentID: {
type: mongoose.Schema.Types.ObjectId, ref: 'Students'
},
positiveID: {
type: mongoose.Schema.Types.ObjectId, ref: 'Positive'
},
teacherID: {
type: mongoose.Schema.Types.ObjectId, ref: 'User'
},
actionName: {
type: String,
},
actionDate: {
type: String
},
actionTime: {
type: String
},
actionMonth: {
type: Number
},
actionYear: {
type: Number
},
points: {
type: Number
},
multiply: {
type: Number
},
totalPoints: {
type: Number
},
dataType: {
type: Number,
default: 1 //1-normal, 2-attendance, 3-notifications, 4-parent app
},
remarks: {
type: String,
},
remarks2: {
type: String,
},
status: {
type: Number, //28 Dec 2018: Currently only used when dataType=2 (attendance). 1:On-Time, 2:Late
},
});
MySchema.index({ schoolID : 1}, {kelasID : 1}, {studentID : 1}, {positiveID : 1}, {actionDate : 1})
module.exports = mongoose.model('Submission', MySchema);
As the document grows, querying data from it are getting slower. I have been thinking of manually creating a new collection for each year starting next year (so it would be named Submission2021, Submission2022 and so on), but to do this I need to modify quite a lot of code, not to mention the hassle of doing something like
var mySubmission;
if (year = 2021){
mySubmission = new Submission2021();
}else if (year = 2022)
mySubmission = new Submission2022();
}else if (year = 2023)
mySubmission = new Submission2032();
}
mySubmission.schoolID = 123
mySubmission.kelasID = 321
mySubmission.save()
So will doing index based on year would be better for me? But my query will involve a lot of searching through either schoolID, kelasID, studentID, positiveID, teacherID, actionDate, so I don't think creating a compound index with year and the other fields inside the collection is a good idea right

Only analytical column stores will offer generally good performance for query across any dimension. So you will have to consider this basic tradeoff: how many indexes do you wish to create vs. insert speed. In mongodb, compound indexes work "left to right" so you given an index created like this:
db.collection.createIndex({year:1, schoolID:1, studentID:1})
then find({year:2020}), find({year:2020,schoolID:"S1"}), and find({year:2020,schoolID:"S1",studentID:"X1"}) will all run fast, and the last one will run really fast because it is practically unique. But find({schoolID:"S1"}) will not because the "leading" component year is not present. You can of course create multiple indexes. Another thing to consider is studentID. Students are unique. And efficiently narrowing the search by year is a natural thing to do. I might recommend starting with these two indexes:
db.collection.createIndex({studentID:1}, {unique:true});
db.collection.createIndex({year:1, schoolID:1}); // compound
These will produce "customary and expected" query results rapidly. Of course, you can add more indexes and at a few million per year, I don't think you have to worry about insert performance.

Related

How to increase count and validate the user is liked or not using nodejs?

I need to increase the like and dislike count in videoSchema, based upon the likedislikeSchema , please tell how to implement it.
For example, if user like the video, the like should be updated in videoSchema by count, and need to add the videoId and userId in likedislikeSchema. Based upon the likedislikeSchema , we need to verify the user that he is like or not.
My Like Dislike Model
const likedislikeSchema = mongoose.Schema(
{
videoId: {
type: ObjectId,
ref: 'video',
},
userId: {
type: ObjectId,
ref: 'user',
},
like: {
type: Number,
default: 0,
},
dislike: {
type: Number,
default: 0,
},
},
{
timestamps: true,
}
)
const LikedislikeModel = mongoose.model('Likedislike', likedislikeSchema)
module.exports = LikedislikeModel
and based upon it , I need to increase the count in Video schema
const mongoose = require('mongoose')
const { ObjectId } = mongoose.Schema.Types
const videoSchema = mongoose.Schema(
{
title: {
type: String,
required: true,
},
description: {
type: String,
required: true,
},
likes: {
type: Number,
default: 0,
},
dislikes: {
type: Number,
default: 0,
},
},
{
timestamps: true,
}
)
const VideoModel = mongoose.model('Video', videoSchema)
module.exports = VideoModel
How to do that depends on how strongly you want the reported number of likes/dislikes to be correct.
Loosely
If it is acceptable for the number of likes reported to be off by a few percent temporarily, i.e. will be corrected in a short period of time, you could do this update in multiple steps:
Create on index in the likedislikes collection on:
{
videoId: 1,
userId: 1
}
When processing a new like or dislike, use Model.findOneAndUpdate with both the upsert and rawResult options
If the user already has a like or dislike for that video, rawResult will cause the original document to be returned, so you can calculate the overall changes to the video's count
findOneAndUpdate in the videos collection to apply the changes
Since the above processes the updates as separate steps it is possible for the first update to succeed but the second to fail. If you don't handle every possible variation of that in your code, it is possible for the counts in the video document to be out of sync with the likedisklies collection.
To remedy that, periodically iterate the videos, and run an aggregation to tally the count of likes and dislikes, and update the video document.
This will give you reasonably accurate counts normally, with precise counts when the periodic count is run.
Strongly
If for some reason you cannot tolerate any imprecision in the counts, you should use multi-document transactions to ensure that both updates either succeed or fail together.
Note that with transactions there will be some document-locking, so individual updates may take a bit longer than the loose option.

How to populate sub document of another model in mongoose?

I have two mongodb model as following.
const CompanySchema = new Schema(
{
sections: [{
name: { type: String },
budgets: [{ // indicates from CalcSchema
index: { type: Number },
title: { type: String },
values: [Number],
sum: { type: Number, default: 0 },
}],
}]
},
{ timestamps: true }
);
const CalcSchema = new Schema({
budget: {
type: Schema.Types.ObjectId, // I want to populate this field. this indicates budget document in Company model
ref: "Company.sections.budgets" //it's possible in mongoose?
},
expense: {
type: Number,
default: 0
}
});
budget field indicate one of budgets field in CompanySchema.
So I want to populate when get Calc data.
But I don't how to populate embedded document.
I tried set ref value to ref: "Company.sections.budgets". but it's not working.
Please anyone help.
Finally, I found answer myself.
There is useful plugin for it.
https://github.com/QuantumGlitch/mongoose-sub-references-populate#readme
And I learned that my schema structure was wrong. It's anti-pattern in mongodb.

Retweet schema in MongoDB

What is the best way to model retweet schema in MongoDB? It is important that I have createdAt times of both original message and the time when retweet occurred because of pagination, I use createdAt as cursor for GraphQL query.
I also need a flag weather the message itself is retweet or original, and id references to original message and original user and reposter user.
I came up with 2 solutions, first one is that I keep ids of reposters and createdAt in array in Message model. The downside is that I have to generate timeline every time and for subscription its not clear what message to push to client.
The second is that I treat retweet as message on its own, I have createdAt and reposterId in place but I have a lot of replication, if I were to add like to message i have to push in array of every single retweet.
I could use help with this what is the most efficient way to do it in MongoDB?
First way:
import mongoose from 'mongoose';
const messageSchema = new mongoose.Schema(
{
text: {
type: mongoose.Schema.Types.String,
required: true,
},
userId: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User',
required: true,
},
likesIds: [{ type: mongoose.Schema.Types.ObjectId, ref: 'User' }],
reposts: [
{
reposterId: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User',
},
createdAt: { type: Date, default: Date.now },
},
],
},
{
timestamps: true,
},
);
const Message = mongoose.model('Message', messageSchema);
Second way:
import mongoose from 'mongoose';
const messageSchema = new mongoose.Schema(
{
text: {
type: mongoose.Schema.Types.String,
required: true,
},
userId: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User',
required: true,
},
likesIds: [{ type: mongoose.Schema.Types.ObjectId, ref: 'User' }],
isReposted: {
type: mongoose.Schema.Types.Boolean,
default: false,
},
repost: {
reposterId: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User',
},
originalMessageId: {
type: mongoose.Schema.Types.ObjectId,
ref: 'Message',
},
},
},
{
timestamps: true,
},
);
const Message = mongoose.model('Message', messageSchema);
export default Message;
Option 2 is the better choice here. I'm operating with the assumption that this is a Twitter re-tweet or Facebook share like functionality. You refer to this functionality as both retweet and repost so I'll stick to "repost" here.
Option 1 creates an efficiency problem where, to find reposts for a user, the db needs to iterate over all of the repost arrays of all the messageSchema collections to ensure it found all of the reposterIds. Storing ids in mongo arrays in collection X referencing collection Y is great if you want to traverse from X to Y. It's not as nice if you want to traverse from Y to X.
With option 2, you can specify a more classic one-to-many relationship between messages and reposts that will be simpler and more efficient to query. Reposts and non-repost messages alike will ultimately be placed into messageSchema in the order the user made them, making organization easier. Option 2 also makes it easy to allow reposting users to add text of their own to the repost, where it can be displayed alongside the repost in the view this feeds into. This is popular on facebook where people add context to the things they share.
My one question is, why are three fields being used to track reposts in Option 2?
isReposted, repost.reposterId and repost.originalMessageId provide redundant data. All that you should need is an originalMessageId field that, if not null, contains a messageSchema key and, if null, signifies that the message is not itself a repost. If you really need it, the userId of the original message's creator can be found in that message when you query for it.
Hope this helps!

MONGODB MULTI PARAMETER SEARCH QUERY

I have the following schema:
var ListingSchema = new Schema({
creatorId : [{ type: Schema.Types.ObjectId, ref: 'User' }],//LISTING CREATOR i.e. specific user
roommatePreference: { //preferred things in roommate
age: {//age preferences if any
early20s: { type: Boolean, default: true },
late20s: { type: Boolean, default: true },
thirtys: { type: Boolean, default: true },
fortysAndOld: { type: Boolean, default: true }
},
gender: {type:String,default:"Male"}
},
roomInfo: {//your own location of which place to rent
address: {type:String,default:"Default"},
city: {type:String,default:"Default"},
state: {type:String,default:"Default"},
zipcode: {type:Number,default:0},
},
location: {//ROOM LOCATION
type: [Number], // [<longitude>, <latitude>]
index: '2d' // create the geospatial index
},
pricing: {//room pricing information
monthlyRent: {type:Number,default:0},
deposit: {type:Number,default:0},
},
availability:{//room availability information
durationOfLease: {
minDuration: {type:Number,default:0},
maxDuration: {type:Number,default:0},
},
moveInDate: { type: Date, default: Date.now }
},
amneties : [{ type: Schema.Types.ObjectId, ref: 'Amnety' }],
rules : [{ type: Schema.Types.ObjectId, ref: 'Rule' }],
photos : [{ type: Schema.Types.ObjectId, ref: 'Media' }],//Array of photos having photo's ids, photos belong to Media class
description: String,//description of room for roomi
status:{type:Boolean,default:true}//STATUS OF ENTRY, BY DEFAULT ACTIVE=TRUE
},
{
timestamps:true
}
);
The application background is like Airbnb/Roomi app, where users can give their rooms/places on rent. Now i want to implement a filter for a user finding the appropriae listing of room.
Here creatorId, rules, amneties are refIds of other schemas. I want to write a query which will give me listings based on several parameters,
e.g. user can pass rules, pricing info, some amneties, gender etc in req queries.
The query parameters depends upon user's will.
Is there any way to do nested query like thing for this?, like the way we did in SQL.
Well, mongodb is not made to be used as relational DB.
instead, i would suggest transforming amenities array into an array of objects with the amenities embeded inside the Listings schema.
so you can query as follows:
// Schema
ListSchema = mongoose.Schema({
....
amneties: [{aType: 'shower'}]
// or you can make it a simple array of strings:
// amneties: ['shower']
....
})
// query
Listings.find({'amneties.aType' : <some amenity>})
there are no joins in mongodb, you can still make "joins" as mongoose calls them populate, but they are happening on your server, and every populations requires a round trip to the server.
if you still wish to use references to the amneties collection, you should query it first and populate the Listing object on them.

Mongoose: 3 way document joining

I'm learning mongoose and need some help. I have 3 collections, and on a single API call, I want to create 3 documents that reference each other; "joins" below:
Users - need to reference chirps
Videos - need to reference chirps
Chirps - need to reference users & chirps
Question: I know that I can do a model.create() and pass in the new document in each callback, and then update to the respective docs, but I was wondering if there's a cleaner way of doing it?
Sorry if I'm not clear on the question. Please ask me if something doesn't make sense.
Code
var chirpSchema = new mongoose.Schema({
date_created: { type: Date, default: Date.now }
, content: { post : String }
, _video: { type: $oid, ref: "video" }
, _author: { type: $oid, ref: "user" }
});
var chirp = mongoose.model('chirp', chirpSchema);
var userSchema = new mongoose.Schema({
date_joined: { type : Date, default: Date.now }
, cookie_id: String,
chirp_library: [{type: $oid, ref: "chirp"}]
})
var user = mongoose.model('user', userSchema);
var videoSchema = new mongoose.Schema({
date_tagged: { type : Date, default: Date.now }
, thumbnail_url : String
, _chirps: [{type: $oid, ref: "chirp" }]
});
var video = mongoose.model('video', videoSchema);
Mongo and other NoSQL databases aren't just an interchangeable alternative to a SQL database. It forces you to rethink your design in a different way. The concept isn't to define a relationship. The idea is to make information available in less queries. Arrays are generally a thing to avoid in Mongo, especially if they have the potential to grow infinitely. Based on your naming, that seems like a strong possibility. If you keep the rest of your schema and just delete those two arrays off of your user and video schemas:
chirp.find({_author: yourUserId}).populate("_author") gives you the same information as user.findOne({_id: yourUserId}) in your current design.
similarly,
chirp.find({_video: yourVideoId}).populate("_video") and video.findOne({_id: yourVideoId})
The only issue with this is that the .populate() is running on every single chirp you are pulling. A way around this is to denormalize some (or all) of your author and video documents on the chirp document. How I would likely design this is:
var chirpSchema = new mongoose.Schema({
date_created: { type: Date, default: Date.now },
content: {
post : String
},
_author: {
_id: { type: $oid, ref: "video" },
date_joined: { type: Date, default: Date.now }
},
_video: {
_id: { type: $oid, ref: "user" },
date_tagged: { type: Date, default: Date.now },
thumbnail_url: String
}
});
var userSchema = new mongoose.Schema({
date_joined: { type : Date, default: Date.now }
, cookie_id: String
})
var videoSchema = new mongoose.Schema({
date_tagged: { type : Date, default: Date.now }
, thumbnail_url : String
});
It's perfectly OK to have data repeated, as long as it makes your queries more efficient. That being said, you need to strike a balance between reading and writing. If your user information or video information changes regularly, you'll have to update that data on each chirp document. If there is a specific field on your video/author that is changing regularly, you can just leave that field off the chirp document if it's not necessary in queries.

Resources