Sorting by virtual field in mongoDB (mongoose) - node.js

Let's say I have some Schema which has a virtual field like this
var schema = new mongoose.Schema(
{
name: { type: String }
},
{
toObject: { virtuals: true },
toJSON: { virtuals: true }
});
schema.virtual("name_length").get(function(){
return this.name.length;
});
In a query is it possible to sort the results by the virtual field? Something like
schema.find().sort("name_length").limit(5).exec(function(docs){ ... });
When I try this, the results are simple not sorted...

You won't be able to sort by a virtual field because they are not stored to the database.
Virtual attributes are attributes that are convenient to have around
but that do not get persisted to mongodb.
http://mongoosejs.com/docs/2.7.x/docs/virtuals.html

Virtuals defined in the Schema are not injected into the generated MongoDB queries. The functions defined are simply run for each document at the appropriate moments, once they have already been retrieved from the database.
In order to reach what you're trying to achieve, you'll also need to define the virtual field within the MongoDB query. For example, in the $project stage of an aggregation.
There are, however, a few things to keep in mind when sorting by virtual fields:
projected documents are only available in memory, so it would come with a huge performance cost if we just add a field and have the entire documents of the search results in memory before sorting
because of the above, indexes will not be used at all when sorting
Here's a general example on how to sort by virtual fields while keeping a relatively good performance:
Imagine you have a collection of teams and each team contains an array of players directly stored into the document. Now, the requirement asks for us to sort those teams by the ranking of the favoredPlayer where the favoredPlayer is basically a virtual property containing the most relevant player of the team under certain criteria (in this example we only want to consider offense and defense players). Also, the aforementioned criteria depend on the users' choices and can, therefore, not be persisted into the document.
To top it off, our "team" document is pretty large, so in order to mitigate the performance hit of sorting in-memory, we project only the fields we need for sorting and then restore the original document after limiting the results.
The query:
[
// find all teams from germany
{ '$match': { country: 'de' } },
// project only the sort-relevant fields
// and add the virtual favoredPlayer field to each team
{ '$project': {
rank: 1,
'favoredPlayer': {
'$arrayElemAt': [
{
// keep only players that match our criteria
$filter: {
input: '$players',
as: 'p',
cond: { $in: ['$$p.position', ['offense', 'defense']] },
},
},
// take first of the filtered players since players are already sorted by relevance in our db
0,
],
},
}},
// sort teams by the ranking of the favoredPlayer
{ '$sort': { 'favoredPlayer.ranking': -1, rank: -1 } },
{ '$limit': 10 },
// $lookup, $unwind, and $replaceRoot are in order to restore the original database document
{ '$lookup': { from: 'teams', localField: '_id', foreignField: '_id', as: 'subdoc' } },
{ '$unwind': { path: '$subdoc' } },
{ '$replaceRoot': { newRoot: '$subdoc' } },
];
For the example you gave above, the code could look something like the following:
var schema = new mongoose.Schema(
{ name: { type: String } },
{
toObject: { virtuals: true },
toJSON: { virtuals: true },
});
schema.virtual('name_length').get(function () {
return this.name.length;
});
const MyModel = mongoose.model('Thing', schema);
MyModel
.aggregate()
.project({
'name_length': {
'$strLenCP': '$name',
},
})
.sort({ 'name_length': -1 })
.exec(function(err, docs) {
console.log(docs);
});

Related

How can I optimize my MongoDB Upsert statement?

A decision was made to switch our database from SQL to noSQL and I have a few questions on best practices and if my current implementation could be improved.
My current SQL implementation for upserting player data after a game.
let template = Players.map(
(player) =>
`(
${player.Rank},"${player.Player_ID}","${player.Player}",${player.Score},${tpp},1
)`,
).join(',');
let stmt = `INSERT INTO playerStats (Rank, Player_ID, Player, Score, TPP, Games_Played)
VALUES ${template}
ON CONFLICT(Player_ID) DO UPDATE
SET Score = Score+excluded.Score,
Games_Played=Games_Played+1,
TPP=TPP+excluded.TPP`;
db.run(stmt, function (upsert_error) { ...
The expected code is to update existing players by checking if a current Player_id exist. If so update their score among other things. Else insert a new player.
Mongo Implementation
const players = [
{ name: 'George', score: 10, id: 'g65873' },
{ name: 'Wayne', score: 100, id: 'g63853' },
{ name: 'Jhonny', score: 500, id: 'b1234' },
{ name: 'David', score: 3, id: 'a5678' },
{ name: 'Dallas', score: 333333, id: 'a98234' },
];
const db = client.db(dbName);
const results = players.map((player) => {
// updateOne(query, update, options)
db.collection('Players')
.updateOne(
{ Player_Name: player.name },
{
$setOnInsert: { Player_Name: player.name, id: player.id },
$inc: { Score: player.score },
},
{ upsert: true, multi: true },
);
});
Is there a better way in mongo to implement this? I tried using updateMany and bulkUpdate and I didn't get the results I expected.
Are there any tips, tricks, or resources aside from the mongo.db that you would recommend for those moving from SQL to noSQL?
Thanks again!
Your approach is fine. However, there are a few flaws:
Command updateOne updates exactly one document as the name implies. Thus multi: true
is obsolete.
Field names are case-sensitive (unlike most SQL databases). It should be $inc: { score: player.score }, not "Score"
Field Player_Name does not exist, it will never find any document for update.
So, your command should be like this:
db.collection('Players').updateOne(
{ name: player.name }, //or { id: player.id } ?
{
$setOnInsert: { name: player.name, id: player.id },
$inc: { score: player.score },
},
{ upsert: true }
)
According to my experience, moving from SQL to NoSQL is harder if you try to translate the SQL statement you have in your mind into a NoSQL command one-by-one. For me it worked better when I wiped out the SQL idea and try to understand and develop the NoSQL command from scratch.
Of course, when you do your first find, delete, insert, update then you will see many analogies to SQL but latest when you approach to the aggregation framework you are lost if you try to translate them into SQL or vice versa.

How can I mix a populated ObjectId with a string

Actually, in the database I got a job that I request with a GET route:
So when I populate candidates I got this response format :
My problem here is I don't need that "id" object, I just need a "selected_candidates" array with users inside as objects. Actually it's an object, in another object that is in an Array.
Here the code from my controller (the populate is in the jobsService):
If I change the data format of the job like that way:
...It is working great (with a path: "candidates_selected") like expected BUT I don't have that "status" string (Normal because I don't have it anymore in the DataBase. Because of ObjectId):
I would like a solution to have them both, but maybe it's the limit of noSQL?
A solution without populate but with a Loop (I don't think it's a good idea):
I think there is no convenience way to achieve it. However you may try the aggregate framework from the native MongoDB driver.
Let your Mongoose schemas be ASchema and BSchema
const result = await ASchema.aggregate([
{$addFields: {org_doc: '$$ROOT'}}, // save original document to retrieve later
{$unwind: '$candidates_selected'},
{
$lookup: {
from: BSchema.collection.name,
let: {
selected_id: '$candidates_selected.id',
status: '$candidates_selected.status',
},
pipeline: [
{
$match: {$expr: {$eq: ['$$selected_id', '$_id']}}, // find candidate by id
},
{
$addFields: {status: '$$status'} // attach status
}
],
as: 'found_candidate'
}
},
{
$group: { // regroup the very first $unwind stage
_id: '$_id',
org_doc: {$first: '$org_doc'},
found_candidates: {
$push: {$arrayElemAt: ['$found_candidate', 0]} // result of $lookup is an array, concat them to reform the original array
}
}
},
{
$addFields: {'org_doc.candidates_selected': '$found_candidates'} // attach found_candidates to the saved original document
},
{
$replaceRoot: {newRoot: '$org_doc'} // recover the original document
}
])

MongoDB: Dynamic Counts

I have two collections. A 'users' collection and an 'events' collection. There is a primary key on the events collection which indicates which user the event belongs to.
I would like to count how many events a user has matching a certain condition.
Currently, I am performing this like:
db.users.find({ usersMatchingACondition }).forEach(user => {
const eventCount = db.events.find({
title: 'An event title that I want to find',
userId: user._id
}).count();
print(`This user has ${eventCount} events`);
});
Ideally what I would like returned is an array or object with the UserID and how many events that user has.
With 10,000 users - this is obviously producing 10,000 queries and I think it could be made a lot more efficient!
I presume this is easy with some kind of aggregate query - but I'm not familiar with the syntax and am struggling to wrap my head around it.
Any help would be greatly appreciated!
You need $lookup to get the data from events matched by user_id. Then you can use $filter to apply your event-level condition and to get a count you can use $size operator
db.users.aggregate([
{
$match: { //users matching condition }
},
{
$lookup:
{
from: 'events',
localField: '_id', //your "primary key"
foreignField: 'user_id',
as: 'user_events'
}
},
{
$addFields: {
user_events: {
$filter: {
input: "$user_events",
cond: {
$eq: [
'$$this.title', 'An event title that I want to find'
]
}
}
}
}
},
{
$project: {
_id: 1,
// other fields you want to retrieve: 1,
totalEvents: { $size: "$user_events" }
}
}
])
There isn't much optimization that can be done without aggregate but since you specifically said that
First, instead of
const eventCount = db.events.find({
title: 'An event title that I want to find',
userId: user._id
}).count();
Do
const eventCount = db.events.count({
title: 'An event title that I want to find',
userId: user._id
});
This will greatly speed up your queries because the find query actually fetches the documents first and then does the counting.
For returning an array you can just initialize an array at the start and push {userid: id, count: eventCount} objects to it.

Conditional update, depending on field matched

Say I have a collection of documents, each one managing a discussion between a teacher and a student:
{
_id,
teacherId,
studentId,
teacherLastMessage,
studentLastMessage
}
I will get queries with 3 parameters: an _id, a userId and a message.
I'm looking for a way to update the teacherLastMessage field or studentLastMessage field depending on which one the user is.
At the moment, I have this:
return Promise.all([
// if user is teacher, set teacherLastMessage
db.collection('discussions').findOneAndUpdate({
teacherId: userId,
_id
}, {
$set: {
teacherLastMessage: message
}
}, {
returnOriginal: false
}),
// if user is student, set studentLastMessage
db.collection('discussions').findOneAndUpdate({
studentId: userId,
_id
}, {
$set: {
studentLastMessage: message
}
}, {
returnOriginal: false
})
]).then((results) => {
results = results.filter((result) => result.value);
if (!results.length) {
throw new Error('No matching document');
}
return results[0].value;
});
Is there a way to tell mongo to make a conditional update, based on the field matched? Something like this:
db.collection('discussions').findOneAndUpdate({
$or: [{
teacherId: userId
}, {
studentId: userId
}],
_id
}, {
$set: {
// if field matched was studentId, set studentLastMessage
// if field matched was teacherId, set teacherLastMessage
}
});
Surely it must be possible with mongo 3.2?
What you want would require referencing other fields inside of $set. This is currently impossible. Refer to this ticket as an example.
First of all, your current approach with two update queries looks just fine to me. You can continue using that, just make sure that you have the right indexes in place. Namely, to get the best performance for these updates, you should have two compound indexes:
{ _id: 1, teacherId: 1 }
{ _id: 1, studentId: 1 }.
To look at this from another perspective, you should probably restructure your data. For example:
{
_id: '...',
users: [
{
userId: '...',
userType: 'student',
lastMessage: 'lorem ipsum'
},
{
userId: '...',
userType: 'teacher',
lastMessage: 'dolor sit amet'
}
]
}
This would allow you to perform your update with a single query.
Your data structure is a bit weird, unless you have a specific business case which requires the data the be molded that way i would suggest creating a usertype unless a user can both be a teacher and a student then keep your structure.
The $set{} param can take a object, my suggestion is to do your business logic prior. You should already know prior to your update if the update is going to be for a teacher or student - some sort of variable should be set / authentication level to distinguish teachers from students. Perhaps on a successful login in the callback you could set a cookie/local storage. Regardless - if you have the current type of user, then you could build your object earlier, so make an object literal with the properties you need based on the user type.
So
if(student)
{
var updateObj = { studentLastMsg: msg }
}
else
{
var updateObj = { teacherLastMsg: msg }
}
Then pass in your update for the $set{updateObj} I'll make this a snippet - on mobile

mongoose subdocument sorting

I have an article schema that has a subdocument comments which contains all the comments i got for this particular article.
What i want to do is select an article by id, populate its author field and also the author field in comments. Then sort the comments subdocument by date.
the article schema:
var articleSchema = new Schema({
title: { type: String, default: '', trim: true },
body: { type: String, default: '', trim: true },
author: { type: Schema.ObjectId, ref: 'User' },
comments: [{
body: { type: String, default: '' },
author: { type: Schema.ObjectId, ref: 'User' },
created_at: { type : Date, default : Date.now, get: getCreatedAtDate }
}],
tags: { type: [], get: getTags, set: setTags },
image: {
cdnUri: String,
files: []
},
created_at: { type : Date, default : Date.now, get: getCreatedAtDate }
});
static method on article schema: (i would love to sort the comments here, can i do that?)
load: function (id, cb) {
this.findOne({ _id: id })
.populate('author', 'email profile')
.populate('comments.author')
.exec(cb);
},
I have to sort it elsewhere:
exports.load = function (req, res, next, id) {
var User = require('../models/User');
Article.load(id, function (err, article) {
var sorted = article.toObject({ getters: true });
sorted.comments = _.sortBy(sorted.comments, 'created_at').reverse();
req.article = sorted;
next();
});
};
I call toObject to convert the document to javascript object, i can keep my getters / virtuals, but what about methods??
Anyways, i do the sorting logic on the plain object and done.
I am quite sure there is a lot better way of doing this, please let me know.
I could have written this out as a few things, but on consideration "getting the mongoose objects back" seems to be the main consideration.
So there are various things you "could" do. But since you are "populating references" into an Object and then wanting to alter the order of objects in an array there really is only one way to fix this once and for all.
Fix the data in order as you create it
If you want your "comments" array sorted by the date they are "created_at" this even breaks down into multiple possibilities:
It "should" have been added to in "insertion" order, so the "latest" is last as you note, but you can also "modify" this in recent ( past couple of years now ) versions of MongoDB with $position as a modifier to $push :
Article.update(
{ "_id": articleId },
{
"$push": { "comments": { "$each": [newComment], "$position": 0 } }
},
function(err,result) {
// other work in here
}
);
This "prepends" the array element to the existing array at the "first" (0) index so it is always at the front.
Failing using "positional" updates for logical reasons or just where you "want to be sure", then there has been around for an even "longer" time the $sort modifier to $push :
Article.update(
{ "_id": articleId },
{
"$push": {
"comments": {
"$each": [newComment],
"$sort": { "$created_at": -1 }
}
}
},
function(err,result) {
// other work in here
}
);
And that will "sort" on the property of the array elements documents that contains the specified value on each modification. You can even do:
Article.update(
{ },
{
"$push": {
"comments": {
"$each": [],
"$sort": { "$created_at": -1 }
}
}
},
{ "multi": true },
function(err,result) {
// other work in here
}
);
And that will sort every "comments" array in your entire collection by the specified field in one hit.
Other solutions are possible using either .aggregate() to sort the array and/or "re-casting" to mongoose objects after you have done that operation or after doing your own .sort() on the plain object.
Both of these really involve creating a separate model object and "schema" with the embedded items including the "referenced" information. So you could work upon those lines, but it seems to be unnecessary overhead when you could just sort the data to you "most needed" means in the first place.
The alternate is to make sure that fields like "virtuals" always "serialize" into an object format with .toObject() on call and just live with the fact that all the methods are gone now and work with the properties as presented.
The last is a "sane" approach, but if what you typically use is "created_at" order, then it makes much more sense to "store" your data that way with every operation so when you "retrieve" it, it stays in the order that you are going to use.
You could also use JavaScript's native Array sort method after you've retrieved and populated the results:
// Convert the mongoose doc into a 'vanilla' Array:
const articles = yourArticleDocs.toObject();
articles.comments.sort((a, b) => {
const aDate = new Date(a.updated_at);
const bDate = new Date(b.updated_at);
if (aDate < bDate) return -1;
if (aDate > bDate) return 1;
return 0;
});
As of the current release of MongoDB you must sort the array after database retrieval. But this is easy to do in one line using _.sortBy() from Lodash.
https://lodash.com/docs/4.17.15#sortBy
comments = _.sortBy(sorted.comments, 'created_at').reverse();

Resources