Aggregate results from multiple Mongoose models - node.js

There are several models that are considerably different and belong to different collections, yet they have common fields that will be used when query results are aggregated together.
const BlogPost = mongoose.model('BlogPost', new mongoose.Schema({
title: String,
body: String,
promoteAtMainPage: {
type: Boolean,
default: false
},
timestamp: Date,
active: Boolean,
/ * the rest are different */
});
const Article = mongoose.model('Article', new mongoose.Schema({
title: String,
body: String,
promoteAtMainPage: {
type: Boolean,
default: false
},
timestamp: Date,
active: Boolean,
/ * the rest are different */
});
Only common fields (title, body, timestamp) are used from the result.
Additionally, if promoteAtMainPage is missing in document, this should be treated differently in these models (it defaults to true in one case and false in another).
Currently this is done with result processing:
let blogPosts = await BlogPost.aggregate([{ $match { active: true } }, { $sort: { timestamp: -1} }, { $limit: 100 } }]);
for (let blogPost of blogPosts)
blogPost.promoteAtMainPage = ('promoteAtMainPage' in blogPost)
? blogPost.promoteAtMainPage
: false;
let articles = await Article.aggregate([{ $match { active: true } }, { $sort: { timestamp: -1} }, { $limit: 100 } }]);
for (let article of articles)
article .promoteAtMainPage = ('promoteAtMainPage' in article )
? article .promoteAtMainPage
: true;
let mainPagePosts = [...blogPosts, ...articles]
.filter(post => post.promoteAtMainPage)
.sort((a, b) => b.timestamp - a.timestamp)
.slice(0, 100);
This results in requesting 200 documents instead of 100 and doing extra sort.
Is this possible in this case to aggregate the results from different collections by means of Mongoose or Mongodb only?
Can the case with missing promoteAtMainPage field be handled by Mongodb as well, or the only reasonable way to handle this is to apply a migration to all existing documents and add default promoteAtMainPage value?

Have a look at $mergeObjects from the MongoDB documentation. It allows you to combine fields from multiple collections and return a new temporary collection with documents combined from the original collections.
https://docs.mongodb.com/manual/reference/operator/aggregation/mergeObjects/
You might be able to use that to combine your BlogPost and Article collections.

Related

How to make query for 3 level nested object in mongodb

I am trying to query and filter objects by comparing specific fields inside 3rd level of my objects. I am not sure how to use filter with $lte or $gte in the third level. For example, in my object below I wanted to filter documents whose delivery time (delivery_rule -> time -> $lte: max) but I can't get it using this query:
if (filters.time) {
query = {
...query,
"delivery_rule.time.max": { $lte: filters.time }
};
}
and my schema is :
var VendorSchema = new mongoose.Schema({
...,
delivery_rule: {
...,
time: {
min: {
type: Number,
default: 0
},
mid: {
type: Number,
default: 0
},
max: {
type: Number,
default: 0
}
},
});
module.exports = mongoose.model("Vendor", VendorSchema);
When I run my query using filters.time = 30 in the shell it returns me [] objects, but I have 5 objects with time 60.
The one that I did works well. I got a little confusion in testing))

Mongoose filter nested array in pre find query

I have the following issue. I have some comments that are soft-deletable. So they have a flag is_deleted and it is set to true when a record is deleted.
My comments aren't an independent model, but they are a nested array in another model (simplified model):
let CommentSchema = new Schema({
text: {
type: String,
required: true
},
modified_at: {
type: Date,
default: null
},
created_at: {
type: Date,
default: Date.now
},
is_deleted: {
type: Boolean,
default: false
},
});
let BookSchema = new Schema({
...
comments: [CommentSchema],
...
});
Now when I get all my Books with Books.find({}, (err, books) => {}) I wanted to filter out the deleted comments in:
BookSchema.pre('find', function() {
this.aggregate(
{},
{ $unwind: '$comments'},
{ $match: {'comments.is_deleted': false}})
});
But it does not work. Any idea how to write the query, so that it only return the non-deleted nested comments without creating an independent Comment collection?
EDIT: I didn't mention it's an endpoint where I access only one book object. The resource url is like this: /books/{bookId}/comments. But also nice to have if it would work when getting all book objects.
You can use the positional $ operator and filter directly from find. As greatly explaned by #Blakes Seven Here
BookSchema.find({
'comments.is_deleted': false,
}, {
'comments.$': 1,
});
EDIT:
As you said the $ operator only return one element, here is what the document says :
The positional $ operator limits the contents of an from the
query results to contain only the first element matching the query
document
There is two solution to your problem:
Make an aggregate, which is usually slow to get executed by database
Get the books and filter the comments using loops
BookSchema.find({
'comments.is_deleted': false,
}).map(x => Object.assign(x, {
comments: x.comments.filter(y => !y.is_deleted),
}));
The find get all books that have a non-deleted comment.
The map loop on each book.
We then remove the comments marked as deleted

Find documents with limits from multiple MongoDB collections and as return sorted list using Mongoose

If I have different types of documents, each in their own collections, is there a way to search for posts from all collections and return them as a single list ordered by something like a datestamp?
Further, I need:
To be able to decide how many posts I need in total from all collections
The posts should be ordered by the same criteria - which means the number of posts will be different from each collection
To be able to start collecting with an offset (say, give me 100 posts starting at post no. 201).
If I saved all documents in the same collection this task would be rather easy but would also require a dynamic, largely undocumented schema since each document will be very different except for a few parameters such as the date.
So, is there a way to keep my documents in well defined schemas, each in separate collections but still being able to accomplish the above?
For argument's sake, here's how the schemas could look divided up:
var InstagramPostSchema = new Schema({
date: Date,
imageUrl: String,
...
})
var TwitterPostSchema = new Schema({
date: Date,
message: String,
...
})
And if I made one universal schema it could look like this:
var SocialPostSchema = new Schema({
date: Date,
type: String,
postData: {}
})
What's the preferred way to do this?
The ideal way would be if I could write separate schemas that inherits from a common base schema, but I'm not familiar enough with Mongoose and MongoDB to know if there's a native way to do this.
There is a good way to do this which is also a bit nicer and with some benifts over your final suggestion, and it is to use discriminators.
The basic idea is that there is a base schema with common properties or even no properties at all for which you are going to define your main collection from. Each other schema then inherrits from that and also shares the same collection.
As a basic demonstration:
var async = require('async'),
util = require('util'),
mongoose = require('mongoose'),
Schema = mongoose.Schema;
mongoose.connect('mongodb://localhost/test');
function BaseSchema() {
Schema.apply(this,arguments);
this.add({
date: { type: Date, default: Date.now },
name: { type: String, required: true }
});
}
util.inherits(BaseSchema,Schema);
var socialPostSchema = new BaseSchema();
var instagramPostSchema = new BaseSchema({
imageUrl: { type: String, required: true }
});
var twitterPostSchema = new BaseSchema({
message: { type: String, required: true }
});
var SocialPost = mongoose.model('SocialPost', socialPostSchema ),
InstagramPost = SocialPost.discriminator(
'InstagramPost', instagramPostSchema ),
TwitterPost = SocialPost.discriminator(
'TwitterPost', twitterPostSchema );
async.series(
[
function(callback) {
SocialPost.remove({},callback);
},
function(callback) {
InstagramPost.create({
name: 'My instagram pic',
imageUrl: '/myphoto.png'
},callback);
},
function(callback) {
setTimeout(
function() {
TwitterPost.create({
name: "My tweet",
message: "ham and cheese panini #livingthedream"
},callback);
},
1000
);
},
function(callback) {
SocialPost.find({}).sort({ "date": -1 }).exec(callback);
}
],
function(err,results) {
if (err) throw err;
results.shift();
console.dir(results);
mongoose.disconnect();
}
);
With output:
[ { __v: 0,
name: 'My instagram pic',
imageUrl: '/myphoto.png',
__t: 'InstagramPost',
date: Wed Aug 19 2015 22:53:23 GMT+1000 (AEST),
_id: 55d47c43122e5fe5063e01bc },
{ __v: 0,
name: 'My tweet',
message: 'ham and cheese panini #livingthedream',
__t: 'TwitterPost',
date: Wed Aug 19 2015 22:53:24 GMT+1000 (AEST),
_id: 55d47c44122e5fe5063e01bd },
[ { _id: 55d47c44122e5fe5063e01bd,
name: 'My tweet',
message: 'ham and cheese panini #livingthedream',
__v: 0,
__t: 'TwitterPost',
date: Wed Aug 19 2015 22:53:24 GMT+1000 (AEST) },
{ _id: 55d47c43122e5fe5063e01bc,
name: 'My instagram pic',
imageUrl: '/myphoto.png',
__v: 0,
__t: 'InstagramPost',
date: Wed Aug 19 2015 22:53:23 GMT+1000 (AEST) } ] ]
So the things to notice there are that even though we defined separate models and even seperate schemas, all items are in fact in the same collection. As part of the discriminator, each document stored has a __t field depicting it's type.
So the really nice things here are:
You can store everything in one collection and query all objects together
You can seperate validation rules per schema and/or define things in a "base" so you don't need to write it out multiple times.
The objects "explode" into their own class defintions by the attached schema to the model for each type. This includes any attached methods. So these are first class objects when you create or retrieve the data.
If you wanted to work with just a specific type such as "TwitterPost", then using that model "automatically" filters out anything else but the "twitter" posts from any query operations performed, just by using that model.
Keeping things in the one collection makes a lot of sense, especially if you want to try and aggregate data accross the information for different types.
A word of caution is that though you can have completely different objects using this pattern, it is generally wise to have as much in common as makes sense to your operations. This is particularly useful in querying or aggregating across different types.
So where possible, try to convert "legacy imported" data to a more "common" format of fields, and just keep the unique properties that are really required for each object type.
As to the first part of your question where you wanted to query "each collection" with something like different limits and then sort the overall results from each, well you can do that too.
There are various techniques, but keeping in the MongoDB form, there is nedb which you an use to both store the combined results and "sort" them as well. And all is done in a manner you are used to:
var async = require('async'),
util = require('util'),
mongoose = require('mongoose'),
DataStore = require('nedb'),
Schema = mongoose.Schema;
mongoose.connect('mongodb://localhost/test');
function BaseSchema() {
Schema.apply(this,arguments);
this.add({
date: { type: Date, default: Date.now },
name: { type: String, required: true }
});
}
util.inherits(BaseSchema,Schema);
var socialPostSchema = new BaseSchema();
var instagramPostSchema = new BaseSchema({
imageUrl: { type: String, required: true }
});
var twitterPostSchema = new BaseSchema({
message: { type: String, required: true }
});
var SocialPost = mongoose.model('SocialPost', socialPostSchema ),
InstagramPost = SocialPost.discriminator(
'InstagramPost', instagramPostSchema ),
TwitterPost = SocialPost.discriminator(
'TwitterPost', twitterPostSchema );
async.series(
[
function(callback) {
SocialPost.remove({},callback);
},
function(callback) {
InstagramPost.create({
name: 'My instagram pic',
imageUrl: '/myphoto.png'
},callback);
},
function(callback) {
setTimeout(
function() {
TwitterPost.create({
name: "My tweet",
message: "ham and cheese panini #livingthedream"
},callback);
},
1000
);
},
function(callback) {
var ds = new DataStore();
async.parallel(
[
function(callback) {
InstagramPost.find({}).limit(1).exec(function(err,posts) {
async.each(posts,function(post,callback) {
post = post.toObject();
post.id = post._id.toString();
delete post._id;
ds.insert(post,callback);
},callback);
});
},
function(callback) {
TwitterPost.find({}).limit(1).exec(function(err,posts) {
async.each(posts,function(post,callback) {
post = post.toObject();
post.id = post._id.toString();
delete post._id;
ds.insert(post,callback);
},callback);
});
}
],
function(err) {
if (err) callback(err);
ds.find({}).sort({ "date": -1 }).exec(callback);
}
);
}
],
function(err,results) {
if (err) throw err;
results.shift();
console.dir(results);
mongoose.disconnect();
}
);
Same output as before with the latest post sorted first, except that this time a query was sent to each model and we just got results from each and combined them.
If you change the query output and writes to the combined model to use "stream" processing, then you even have basically the same memory consumption and likely faster processing of results from parallel queries.

mongoose subdocument sorting

I have an article schema that has a subdocument comments which contains all the comments i got for this particular article.
What i want to do is select an article by id, populate its author field and also the author field in comments. Then sort the comments subdocument by date.
the article schema:
var articleSchema = new Schema({
title: { type: String, default: '', trim: true },
body: { type: String, default: '', trim: true },
author: { type: Schema.ObjectId, ref: 'User' },
comments: [{
body: { type: String, default: '' },
author: { type: Schema.ObjectId, ref: 'User' },
created_at: { type : Date, default : Date.now, get: getCreatedAtDate }
}],
tags: { type: [], get: getTags, set: setTags },
image: {
cdnUri: String,
files: []
},
created_at: { type : Date, default : Date.now, get: getCreatedAtDate }
});
static method on article schema: (i would love to sort the comments here, can i do that?)
load: function (id, cb) {
this.findOne({ _id: id })
.populate('author', 'email profile')
.populate('comments.author')
.exec(cb);
},
I have to sort it elsewhere:
exports.load = function (req, res, next, id) {
var User = require('../models/User');
Article.load(id, function (err, article) {
var sorted = article.toObject({ getters: true });
sorted.comments = _.sortBy(sorted.comments, 'created_at').reverse();
req.article = sorted;
next();
});
};
I call toObject to convert the document to javascript object, i can keep my getters / virtuals, but what about methods??
Anyways, i do the sorting logic on the plain object and done.
I am quite sure there is a lot better way of doing this, please let me know.
I could have written this out as a few things, but on consideration "getting the mongoose objects back" seems to be the main consideration.
So there are various things you "could" do. But since you are "populating references" into an Object and then wanting to alter the order of objects in an array there really is only one way to fix this once and for all.
Fix the data in order as you create it
If you want your "comments" array sorted by the date they are "created_at" this even breaks down into multiple possibilities:
It "should" have been added to in "insertion" order, so the "latest" is last as you note, but you can also "modify" this in recent ( past couple of years now ) versions of MongoDB with $position as a modifier to $push :
Article.update(
{ "_id": articleId },
{
"$push": { "comments": { "$each": [newComment], "$position": 0 } }
},
function(err,result) {
// other work in here
}
);
This "prepends" the array element to the existing array at the "first" (0) index so it is always at the front.
Failing using "positional" updates for logical reasons or just where you "want to be sure", then there has been around for an even "longer" time the $sort modifier to $push :
Article.update(
{ "_id": articleId },
{
"$push": {
"comments": {
"$each": [newComment],
"$sort": { "$created_at": -1 }
}
}
},
function(err,result) {
// other work in here
}
);
And that will "sort" on the property of the array elements documents that contains the specified value on each modification. You can even do:
Article.update(
{ },
{
"$push": {
"comments": {
"$each": [],
"$sort": { "$created_at": -1 }
}
}
},
{ "multi": true },
function(err,result) {
// other work in here
}
);
And that will sort every "comments" array in your entire collection by the specified field in one hit.
Other solutions are possible using either .aggregate() to sort the array and/or "re-casting" to mongoose objects after you have done that operation or after doing your own .sort() on the plain object.
Both of these really involve creating a separate model object and "schema" with the embedded items including the "referenced" information. So you could work upon those lines, but it seems to be unnecessary overhead when you could just sort the data to you "most needed" means in the first place.
The alternate is to make sure that fields like "virtuals" always "serialize" into an object format with .toObject() on call and just live with the fact that all the methods are gone now and work with the properties as presented.
The last is a "sane" approach, but if what you typically use is "created_at" order, then it makes much more sense to "store" your data that way with every operation so when you "retrieve" it, it stays in the order that you are going to use.
You could also use JavaScript's native Array sort method after you've retrieved and populated the results:
// Convert the mongoose doc into a 'vanilla' Array:
const articles = yourArticleDocs.toObject();
articles.comments.sort((a, b) => {
const aDate = new Date(a.updated_at);
const bDate = new Date(b.updated_at);
if (aDate < bDate) return -1;
if (aDate > bDate) return 1;
return 0;
});
As of the current release of MongoDB you must sort the array after database retrieval. But this is easy to do in one line using _.sortBy() from Lodash.
https://lodash.com/docs/4.17.15#sortBy
comments = _.sortBy(sorted.comments, 'created_at').reverse();

How to sort array of embedded documents via Mongoose query?

I'm building a node.js application with Mongoose and have a problem related to sorting embedded documents. Here's the schema I use:
var locationSchema = new Schema({
lat: { type: String, required: true },
lon: { type: String, required: true },
time: { type: Date, required: true },
acc: { type: String }
})
var locationsSchema = new Schema({
userId: { type: ObjectId },
source: { type: ObjectId, required: true },
locations: [ locationSchema ]
});
I'd like to output the locations embedded in the userLocations documented sorted by their time attribute. I currently do the sorting in JavaScript after I retrieved the data from MongoDb like so:
function locationsDescendingTimeOrder(loc1, loc2) {
return loc2.time.getTime() - loc1.time.getTime()
}
LocationsModel.findOne({ userId: theUserId }, function(err, userLocations) {
userLocations.locations.sort(locationsDescendingTimeOrder).forEach(function(location) {
console.log('location: ' + location.time);
}
});
I did read about the sorting API provided by Mongoose but I couldn't figure out if it can be used for sorting arrays of embedded documents and if yes, if it is a sensible approach and how to apply it to this problem. Can anyone help me out here, please?
Thanks in advance and cheers,
Georg
You're doing it the right way, Georg. Your other options are either to sort locations by time upon embedding in the first place, or going the more traditional non-embedded route (or minimally embedded route so that you may be embedding an array of ids or something but you're actually querying the locations separately).
This also can be done using mongoose sort API as well.
LocationsModel.findOne({ userId: theUserId })
// .sort({ "locations.time": "desc" }) // option 1
.sort("-locations.time") // option 2
.exec((err, result) => {
// compute fetched data
})
Sort by field in nested array with Mongoose.js
More methods are mentioned in this answer as well
Sorting Options in mogoose
Mongoose Sort API

Resources