Bad performance on a sorting request

Bad performance on a sorting request - node.js

I have a very simple query in a NodeJS / Mongoose application:
const blocks = await Block
.find({
content: ObjectId(internalId),
})
.sort({ position: 1, _id: 1 })
with the schema:
const BlockSchema = mongoose.Schema({
id: String,
(...)
content: {
type: mongoose.Schema.Types.ObjectId,
ref: 'Domain',
index: true
},
concept: {
type: mongoose.Schema.Types.ObjectId,
ref: 'ConceptDetails',
index: true
},
conceptDetails: {
type: mongoose.Schema.Types.ObjectId,
ref: 'ConceptDetails',
index: true
},
creator: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User'
}
});
const Block = mongoose.model(
'Block',
BlockSchema
);
The performance of this simple query was really bad with real data (around 900ms) so I added the following index:
db.blocks.createIndex({ position: 1, _id: 1 });
It improves the performance (around 330ms) but I expected to have something better for a request like that. FYI I have 13100 block items in the database.
Is there something else I can do to improve performance?
Thanks for your help!

It is because you have a find clause to filter by content. This makes the index not usable. You can check this with explain(). Below is a visualization of the query plan on my local replication of your scenario. You can see COLLSCAN, which indicates the index is not used.
What can we do?
We can build another compound index, which includes the field content to speed up the query. Make sure content is before the your sort fields position and _id so the index can be utilized
db.collection.createIndex({content: 1, position: 1, _id: 1 })
You can check again the query plan:
You can see the query plan changed to IXSCAN, which utilized the new index. You can then expect a faster query benefit from index scan.
You can check out this official doc for more details on query coverage and optimization.

Related

How to increase count and validate the user is liked or not using nodejs?

I need to increase the like and dislike count in videoSchema, based upon the likedislikeSchema , please tell how to implement it.
For example, if user like the video, the like should be updated in videoSchema by count, and need to add the videoId and userId in likedislikeSchema. Based upon the likedislikeSchema , we need to verify the user that he is like or not.
My Like Dislike Model
const likedislikeSchema = mongoose.Schema(
{
videoId: {
type: ObjectId,
ref: 'video',
},
userId: {
type: ObjectId,
ref: 'user',
},
like: {
type: Number,
default: 0,
},
dislike: {
type: Number,
default: 0,
},
},
{
timestamps: true,
}
)
const LikedislikeModel = mongoose.model('Likedislike', likedislikeSchema)
module.exports = LikedislikeModel
and based upon it , I need to increase the count in Video schema
const mongoose = require('mongoose')
const { ObjectId } = mongoose.Schema.Types
const videoSchema = mongoose.Schema(
{
title: {
type: String,
required: true,
},
description: {
type: String,
required: true,
},
likes: {
type: Number,
default: 0,
},
dislikes: {
type: Number,
default: 0,
},
},
{
timestamps: true,
}
)
const VideoModel = mongoose.model('Video', videoSchema)
module.exports = VideoModel

How to do that depends on how strongly you want the reported number of likes/dislikes to be correct.
Loosely
If it is acceptable for the number of likes reported to be off by a few percent temporarily, i.e. will be corrected in a short period of time, you could do this update in multiple steps:
Create on index in the likedislikes collection on:
{
videoId: 1,
userId: 1
}
When processing a new like or dislike, use Model.findOneAndUpdate with both the upsert and rawResult options
If the user already has a like or dislike for that video, rawResult will cause the original document to be returned, so you can calculate the overall changes to the video's count
findOneAndUpdate in the videos collection to apply the changes
Since the above processes the updates as separate steps it is possible for the first update to succeed but the second to fail. If you don't handle every possible variation of that in your code, it is possible for the counts in the video document to be out of sync with the likedisklies collection.
To remedy that, periodically iterate the videos, and run an aggregation to tally the count of likes and dislikes, and update the video document.
This will give you reasonably accurate counts normally, with precise counts when the periodic count is run.
Strongly
If for some reason you cannot tolerate any imprecision in the counts, you should use multi-document transactions to ensure that both updates either succeed or fail together.
Note that with transactions there will be some document-locking, so individual updates may take a bit longer than the loose option.

mongodb create index or multiple collection?

I have a collection that will insert a few million documents every year. My collection looks like this (using mongoose):
var mongoose = require("mongoose");
var Schema = mongoose.Schema;
var MySchema = new Schema({
schoolID: {
type: mongoose.Schema.Types.ObjectId, ref: 'School'
},
kelasID: {
type: mongoose.Schema.Types.ObjectId, ref: 'Kelas'
},
studentID: {
type: mongoose.Schema.Types.ObjectId, ref: 'Students'
},
positiveID: {
type: mongoose.Schema.Types.ObjectId, ref: 'Positive'
},
teacherID: {
type: mongoose.Schema.Types.ObjectId, ref: 'User'
},
actionName: {
type: String,
},
actionDate: {
type: String
},
actionTime: {
type: String
},
actionMonth: {
type: Number
},
actionYear: {
type: Number
},
points: {
type: Number
},
multiply: {
type: Number
},
totalPoints: {
type: Number
},
dataType: {
type: Number,
default: 1 //1-normal, 2-attendance, 3-notifications, 4-parent app
},
remarks: {
type: String,
},
remarks2: {
type: String,
},
status: {
type: Number, //28 Dec 2018: Currently only used when dataType=2 (attendance). 1:On-Time, 2:Late
},
});
MySchema.index({ schoolID : 1}, {kelasID : 1}, {studentID : 1}, {positiveID : 1}, {actionDate : 1})
module.exports = mongoose.model('Submission', MySchema);
As the document grows, querying data from it are getting slower. I have been thinking of manually creating a new collection for each year starting next year (so it would be named Submission2021, Submission2022 and so on), but to do this I need to modify quite a lot of code, not to mention the hassle of doing something like
var mySubmission;
if (year = 2021){
mySubmission = new Submission2021();
}else if (year = 2022)
mySubmission = new Submission2022();
}else if (year = 2023)
mySubmission = new Submission2032();
}
mySubmission.schoolID = 123
mySubmission.kelasID = 321
mySubmission.save()
So will doing index based on year would be better for me? But my query will involve a lot of searching through either schoolID, kelasID, studentID, positiveID, teacherID, actionDate, so I don't think creating a compound index with year and the other fields inside the collection is a good idea right

Only analytical column stores will offer generally good performance for query across any dimension. So you will have to consider this basic tradeoff: how many indexes do you wish to create vs. insert speed. In mongodb, compound indexes work "left to right" so you given an index created like this:
db.collection.createIndex({year:1, schoolID:1, studentID:1})
then find({year:2020}), find({year:2020,schoolID:"S1"}), and find({year:2020,schoolID:"S1",studentID:"X1"}) will all run fast, and the last one will run really fast because it is practically unique. But find({schoolID:"S1"}) will not because the "leading" component year is not present. You can of course create multiple indexes. Another thing to consider is studentID. Students are unique. And efficiently narrowing the search by year is a natural thing to do. I might recommend starting with these two indexes:
db.collection.createIndex({studentID:1}, {unique:true});
db.collection.createIndex({year:1, schoolID:1}); // compound
These will produce "customary and expected" query results rapidly. Of course, you can add more indexes and at a few million per year, I don't think you have to worry about insert performance.

Retweet schema in MongoDB

What is the best way to model retweet schema in MongoDB? It is important that I have createdAt times of both original message and the time when retweet occurred because of pagination, I use createdAt as cursor for GraphQL query.
I also need a flag weather the message itself is retweet or original, and id references to original message and original user and reposter user.
I came up with 2 solutions, first one is that I keep ids of reposters and createdAt in array in Message model. The downside is that I have to generate timeline every time and for subscription its not clear what message to push to client.
The second is that I treat retweet as message on its own, I have createdAt and reposterId in place but I have a lot of replication, if I were to add like to message i have to push in array of every single retweet.
I could use help with this what is the most efficient way to do it in MongoDB?
First way:
import mongoose from 'mongoose';
const messageSchema = new mongoose.Schema(
{
text: {
type: mongoose.Schema.Types.String,
required: true,
},
userId: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User',
required: true,
},
likesIds: [{ type: mongoose.Schema.Types.ObjectId, ref: 'User' }],
reposts: [
{
reposterId: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User',
},
createdAt: { type: Date, default: Date.now },
},
],
},
{
timestamps: true,
},
);
const Message = mongoose.model('Message', messageSchema);
Second way:
import mongoose from 'mongoose';
const messageSchema = new mongoose.Schema(
{
text: {
type: mongoose.Schema.Types.String,
required: true,
},
userId: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User',
required: true,
},
likesIds: [{ type: mongoose.Schema.Types.ObjectId, ref: 'User' }],
isReposted: {
type: mongoose.Schema.Types.Boolean,
default: false,
},
repost: {
reposterId: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User',
},
originalMessageId: {
type: mongoose.Schema.Types.ObjectId,
ref: 'Message',
},
},
},
{
timestamps: true,
},
);
const Message = mongoose.model('Message', messageSchema);
export default Message;

Option 2 is the better choice here. I'm operating with the assumption that this is a Twitter re-tweet or Facebook share like functionality. You refer to this functionality as both retweet and repost so I'll stick to "repost" here.
Option 1 creates an efficiency problem where, to find reposts for a user, the db needs to iterate over all of the repost arrays of all the messageSchema collections to ensure it found all of the reposterIds. Storing ids in mongo arrays in collection X referencing collection Y is great if you want to traverse from X to Y. It's not as nice if you want to traverse from Y to X.
With option 2, you can specify a more classic one-to-many relationship between messages and reposts that will be simpler and more efficient to query. Reposts and non-repost messages alike will ultimately be placed into messageSchema in the order the user made them, making organization easier. Option 2 also makes it easy to allow reposting users to add text of their own to the repost, where it can be displayed alongside the repost in the view this feeds into. This is popular on facebook where people add context to the things they share.
My one question is, why are three fields being used to track reposts in Option 2?
isReposted, repost.reposterId and repost.originalMessageId provide redundant data. All that you should need is an originalMessageId field that, if not null, contains a messageSchema key and, if null, signifies that the message is not itself a repost. If you really need it, the userId of the original message's creator can be found in that message when you query for it.
Hope this helps!

Mongoose populate query return all results with skip/limit

I have the following method in a little node/express app :
async getAll(req, res) {
const movies = await movieModel
.find()
.populate({path: 'genres', select: 'name'})
.skip(0)
.limit(15);
return res.send(movies);
};
With the following schema :
const MovieSchema = new mongoose.Schema({
externalId: { required: true, type: Number },
title: { required: true, type: String },
genres: [{ ref: "Genre", type: mongoose.Schema.Types.ObjectId }],
releaseDate: {type: Date},
originalLanguage: {type : String},
originalTitle: {type : String},
posterPath: {type : String},
backdropPath: {type : String},
overview: {type: String},
comments: [{ ref: "Comment", type: mongoose.Schema.Types.ObjectId }],
votes: [VoteSchema]
}, {timestamps: true}
});
MovieSchema.virtual("averageNote").get(function () {
let avg = 0;
if (this.votes.length == 0) {
return '-';
}
this.votes.forEach(vote => {
avg += vote.note;
});
avg = avg / this.votes.length;
return avg.toFixed(2);
});
MovieSchema.set("toJSON", {
transform: (doc, ret) => {
ret.id = ret._id;
delete ret._id;
delete ret.__v;
},
virtuals: true,
getters: true
});
However the query always return all document entries.
I also tried to add exec() at the end of the query or with .populate({path: 'genres', select: 'name', options: {skip: 0, limit: 15} }) but without result.
I tried on an other schema which is simpler and skip/limit worked just fine, so issue probably comes from my schema but I can't figure out where the problem is.
I also tried with the virtual field commented but still, limit and sort where not used.
My guess is that it's comes from votes: [VoteSchema] since it's the first time I use this, but it was recommanded by my teacher as using ref
isn't recommended in a non relational database. Furthermore, in order to calculate the averageNote as a virtual field, I have no other choice.
EDIT : just tried it back with votes: [{ ref: "Vote", type: mongoose.Schema.Types.ObjectId }] And I still can't limit nor skip
Node version : 10.15.1
MongoDB version : 4.0.6
Mongoose version : 5.3.1
Let me know if I should add any other informations

This is actually more about how .populate() actually works and why the order of "chained methods" here is important. But in brief:
const movies = await movieModel
.find()
.skip(0)
.limit(15)
.populate({path: 'genres', select: 'name'}) // alternately .populate('genres','name')
.exec()
The problem is that .populate() really just runs another query to the database to "emulate" a join. This is not really anything to do with the original .find() since all populate() does is takes the results from the query and uses certain values to "look up" documents in another collection, using that other query. Importantly the results come last.
The .skip() and .limit() on the other had are cursor modifiers and directly part of the underlying MongoDB driver. These belong to the .find() and as such these need to be in sequence
The MongoDB driver part of the builder is is forgiving in that:
.find().limit(15).skip(0)
is also acceptable due to the way the options pass in "all at once", however it's good practice to think of it as skip then limit in that order.
Overall, the populate() method must be the last thing on the chain after any cursor modifiers such as limit() or skip().

MongoDB perormance issues with geoNear

Hey I have a MongoDB database and I'm using Node.js with Mongoose.
I have a collection which mongoose schema looks like this:
{
location2d: {
type: [Number], //[<longitude>, <latitude>]
index: '2dsphere'
},
name: String,
owner: {type: Schema.Types.ObjectId, ref: 'Player', index: true},
}
This collection is quite big (500'000 documents). When I do a simple nearest find query, it runs quite fast ~10 ms.
But when I do something like this:
this.find({owner: {$ne:null}})
.where('location2d')
.near({center: [center.lon, center.lat], maxDistance: range / 6380000, spherical: true})
.limit(10)
.select('owner location2d')
.exec()
it takes a very long time, about 60 seconds! Just because I added {owner: {$ne:null}} in the find method multiplies the times required to perform by 6000.
What am I doing wrong? How can I improve this?
When I do a search by owner it's fast, when I do a search by proximity it's fast but when I combine both it's unbelievably slow.
Any clue?

OK I found a solution, a little bit dirty but very fast.
1st: create a new field called ownerIdAsInt which is the int parsed of the owner mongo Id : document.ownerIdAsInt = parseInt(document.owner.toString(), 16). If owner is null, set that field to 0
2nd: Define a compound index with {ownerIdAsInt: 1, location2d: "2dsphere"}
Your schema should like like this:
var schema = new Schema({
location2d: {
type: [Number], //[<longitude>, <latitude>]
index: '2dsphere'
},
name: String,
owner: {type: Schema.Types.ObjectId, ref: 'Player', index: true},
ownerIdAsInt: Number
});
schema.index({ownerIdAsInt: 1, location2d: "2dsphere"});
and the query now is:
this.find({ownerIdAsInt: {$gt: 0},
location2d: {$nearSphere: [center.lon, center.lat],
$maxDistance: range / 6380000}})
.limit(10)
.select('owner location2d')
.exec()
Result are now ~20 ms long. Much faster!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string