Avoid Aggregate 16MB Limit - node.js

I have a collection of about 1M documents. Each document has internalNumber property and I need to get all internalNumbers in my node.js code.
Previously I was using
db.docs.distinct("internalNumber")
or
collection.distinct('internalNumber', {}, {},(err, result) => { /* ... */ })
in Node.
But with the growth of the collection I started to get the error: distinct is too big, 16m cap.
Now I want to use aggregation. It consumes a lot of memory and it is slow, but it is OK since I need to do it only once at the script startup. I've tried following in Robo 3T GUI tool:
db.docs.aggregate([{$group: {_id: '$internalNumber'} }]);
It works, and I wanted to use it in node.js code the following way:
collection.aggregate([{$group: {_id: '$internalNumber'} }],
(err, docs) => { /* ... * });
But in Node I get an error: "MongoError: aggregation result exceeds maximum document size (16MB) at Function.MongoError.create".
Please help to overcome that limit.

The problem is that the native driver differs from how the shell method is working by default in that the "shell" is actually returning a "cursor" object where the native driver needs this option "explicitly".
Without a "cursor", .aggregate() returns a single BSON document as an array of documents, so we turn it into a cursor to avoid the limitation:
let cursor = collection.aggregate(
[{ "$group": { "_id": "$internalNumber" } }],
{ "cursor": { "batchSize": 500 } }
);
cursor.toArray((err,docs) => {
// work with resuls
});
Then you can use regular methods like .toArray() to make the results a JavaScript array which on the 'client' does not share the same limitations, or other methods for iterating a "cursor".

For Casbah users:
val pipeline = ...
collection.aggregate(pipeline, AggregationOptions(batchSize = 500, outputMode = AggregationOptions.CURSOR)

Related

MongoDB and Node.js aggregate using $sample isn't returning a document

I'm new to Nodejs and mongoDB and I'm trying to get an aggregate function running that will select a random document from the database and return it. I've looked all over on the internet to figure out what I'm doing wrong and from what I can see my code looks like it should. For some reason however, when I try printing the result to console, it gives me an aggregation cursor object and I can't find the document I want anywhere within it. Here is my code for the aggregate function.
//get a random question
route.get('/question/random', function (req, res) {
database.collection('questions').aggregate(
[ { $sample: { size: 1} } ],
function(err, result) {
console.log(result);
})
})
It's because aggregation method returns AggregationCursor that won't return any documents unless you iterate through it.
For a simple iteration, you can do:
database.collection('questions').aggregate([{$sample: {size: 1}}]).forEach(console.log);
The forEach() method on the cursor will iterate it, and in this example will print it to the console.

Mongoose, Nodejs - replace many documents in one I/O?

I have an array of objects and I want to store them in a collection using only one I/O operation if it's possible. If any document already exists in the collection I want to replace it, or insert it otherwise.
These are the solutions that I found, but doesn't work exactly as I want:
insertMany(): this doesn't replace the document that already exists, but throws exception instead (This is what I found in the Mongodb documentation, but I don't know if it's the same as mongoose).
update() or ‎updateMany() with upsert = true: this doesn't help me as well, because here I have to do the same updates to all the to stored documents.
‎There is no replaceMany() in mongodb or mongoose.
Is there anyone how knows any optimal way to do replaceMany using mongoose and node.js
There is bulkWrite (https://docs.mongodb.com/manual/reference/method/db.collection.bulkWrite/), which makes it possible to execute multiple operations at once. In your case, you can use it to perform multiple replaceOne operations with upsert. The code below shows how you can do it with Mongoose:
// Assuming *data* is an array of documents that you want to insert (or replace)
const bulkData = data.map(item => (
{
replaceOne: {
upsert: true,
filter: {
// Filter specification. You must provide a field that
// identifies *item*
},
replacement: item
}
}
));
db.bulkWrite(bulkData);
You need to query like this:
db.getCollection('hotspot').update({
/Your Condition/
}, {
$set: {
"New Key": "Value"
}
}, {
multi: true,
upsert: true
});
It fulfils your requirements..!!!

MongoDB fulltext search: Overflow sort stage buffered data usage

I am trying to implement mongo text search in my node(express.js) application.
Here are my codes:
Collection.find({$text: {$search: searchString}}
, {score: {$meta: "textScore"}})
.sort({score: {$meta: 'textScore'}})
.exec(function(err, docs {
//Process docs
});
I am getting following error when text search is performed on large dataset:
MongoError: Executor error: Overflow sort stage buffered data usage of 33554558 bytes exceeds internal limit of 33554432 bytes
I am aware that MongoDB can sort maximum of 32MB data and this error can be avoided by adding index for field we will be sorting collection with. But in my case I am sorting collection by textScore and I am not exactly sure if is it possible to set index for this field. If not, is there any workaround for this?
NOTE: I am aware there are similar questions on SO but most of these questions do not have textScore as sort criteria and therefore my question is different.
You can use aggregate to circumvent the limit.
Collection.aggregate([
{ $match: { $text: { $search: searchString } } },
{ $sort: { score: { $meta: "textScore" } } }
])
The $sort stage has a 100 MB limit. If you need more, you can use allowDiskUse, that will write to temp files while sorting takes place. To do that just add allowDiskUse: true to the aggregate option.
If your result is greater than 16MB (i.e. MongoDB's document size limit), you need to request a cursor to iterate through your data. Just add .cursor() before your exec and here's a detailed example. http://mongoosejs.com/docs/api.html#aggregate_Aggregate-cursor

Using the find method on a MongoDB collection with Monk

I am working through a MEAN stack tutorial. It contains the following code as a route in index.js. The name of my Mongo collection is brandcollection.
/* GET Brand Complaints page. */
router.get('/brands', function(req, res) {
var db = req.db;
var collection = db.get('brandcollection');
collection.find({},{},function(e,docs){
res.render('brands', {
"brands" : docs
});
});
});
I would like to modify this code but I don't fully understand how the .find method is being invoked. Specifically, I have the following questions:
What objects are being passed to function(e, docs) as its arguments?
Is function(e, docs) part of the MongoDB syntax? I have looked at the docs on Mongo CRUD operations and couldn't find a reference to it. And it seems like the standard syntax for a Mongo .find operation is collection.find({},{}).someCursorLimit(). I have not seen a reference to a third parameter in the .find operation, so why is one allowed here?
If function(e, docs) is not a MongoDB operation, is it part of the Monk API?
It is clear from the tutorial that this block of code returns all of the documents in the collection and places them in an object as an attribute called "brands." However, what role specifically does function(e, docs) play in that process?
Any clarification would be much appreciated!
The first parameter is the query.
The second parameter(which is optional) is the projection i.e if you want to restrict the contents of the matched documents
collection.find( { qty: { $gt: 25 } }, { item: 1, qty: 1 },function(e,docs){})
would mean to get only the item and qty fields in the matched documents
The third parameter is the callback function which is called after the query is complete. function(e, docs) is the mongodb driver for node.js syntax. The 1st parameter e is the error. docs is the array of matched documents. If an error occurs it is given in e. If the query is successful the matched documents are given in the 2nd parameter docs(the name can be anything you want).
The cursor has various methods which can be used to manipulate the matched documents before mongoDB returns them.
collection.find( { qty: { $gt: 25 } }, { item: 1, qty: 1 })
is a cursor you can do various operations on it.
collection.find( { qty: { $gt: 25 } }, { item: 1, qty: 1 }).skip(10).limit(5).toArray(function(e,docs){
...
})
meaning you will skip the first 10 matched documents and then return a maximum of 5 documents.
All this stuff is given in the docs. I think it's better to use mongoose instead of the native driver because of the features and the popularity.

Mongodb toArray() performance

I have a collection 'matches' with 727000 documents inside. It has 6 fields inside, no arrays just simple integers and object Ids. I am doing query to collection as follows:
matches.find({
$or: [{
homeTeamId: getObjectId(teamId)
}, {
awayTeamId: getObjectId(teamId)
}
],
season: season,
seasonDate: {
'$gt': dayMin,
'$lt': dayMax
}
}).sort({
seasonDate: 1
}).toArray(function (e, res) {
callback(res);
});
Results returning only around 7-8 documents.
The query takes about ~100ms, which i think is quite reasonable, but the main problem is, when i call method toArray(), it adds about ~600ms!!
I am running server on my laptop, Intel Core I5, 6GB RAM but i can't believe it adds 600ms for 7-8 documents.
Tried using mongodb-native driver, now switched to mongoskin, and stil getting the same slow results.
Any suggestions ?
toArray() method iterate throw all cursor element and load them on memory, it is a highly cost operation. Maybe you can add index to improve your query performance, and/or avoid toArray iterating yourself throw the Cursor.
Regards,
Moacy

Resources