Mongoose update multiple geospacial index with no limit - node.js

I have some Mongoose Models with geospacial indexes:
var User = new Schema({
"name" : String,
"location" : {
"id" : String,
"name" : String,
"loc" : { type : Array, index : '2d'}
}
});
I'm trying to update all items that are in an area - for instance:
User.update({ "location.loc" : { "$near" : [ -122.4192, 37.7793 ], "$maxDistance" : 0.4 } }, { "foo" : "bar" },{ "multi" : true }, function(err){
console.log("done!");
});
However, this appears to only update the first 100 records. Looking at the docs, it appears there is a native limit on finds on geospatial indices for that applies when you don't set a limit.
(from docs:
Use limit() to specify a maximum number of points to return (a default limit of 100 applies if unspecified))
This appears to also apply to updates, regardless of the multi flag, which is a giant drag. If I apply an update, it only updates the first 100.
Right now the only way I can think of to get around this is to do something hideous like this:
Model.find({"location.loc" : { "$near" : [ -122.4192, 37.7793 ], "$maxDistance" : 0.4 } },{limit:0},function(err,results){
var ids = results.map(function(r){ return r._id; });
Model.update({"_id" : { $in : ids }},{"foo":"bar"},{multi:true},function(){
console.log("I have enjoyed crippling your server.");
});
});
While I'm not even entirely sure that would work (and it could be mildly optimized by only selecting the _id), I'd really like to avoid keeping an array of n ids in memory, as that number could get very large.
Edit:
The above hack doesn't even work, looks like a find with {limit:0} still returns 100 results. So, in an act of sheer desperation and frustration, I have written a recursive method to paginate through ids, then return them so I can update using the above method. I have added the method as an answer below, but not accepted it in hopes that someone will find a better way.
This is a problem in mongo server core as far as I can tell, so mongoose and node-mongodb-native are not to blame. However, this is really stupid, as geospacial indices is one of the few reasons to use mongo over some other more robust NoSQL stores.
Is there a way to achieve this? Even in node-mongodb-native, or the mongo shell, I can't seem to find a way to set (or in this case, remove by setting to 0) a limit on an update.

I'd love to see this issue fixed, but I can't figure out a way to set a limit on an update, and after extensive research, it doesn't appear to be possible. In addition, the hack in the question doesn't even work, I still only get 100 records with a find and limit set to 0.
Until this is fixed in mongo, here's how I'm getting around it: (!!WARNING: UGLY HACKS AHEAD:!!)
var getIdsPaginated = function(query,batch,callback){
// set a default batch if it isn't passed.
if(!callback){
callback = batch;
batch = 10000;
}
// define our array and a find method we can call recursively.
var all = [],
find = function(skip){
// skip defaults to 0
skip = skip || 0;
this.find(query,['_id'],{limit:batch,skip:skip},function(err,items){
if(err){
// if an error is thrown, call back with it and how far we got in the array.
callback(err,all);
} else if(items && items.length){
// if we returned any items, grab their ids and put them in the 'all' array
var ids = items.map(function(i){ return i._id.toString(); });
all = all.concat(ids);
// recurse
find.call(this,skip+batch);
} else {
// we have recursed and not returned any ids. This means we have them all.
callback(err,all);
}
}.bind(this));
};
// start the recursion
find.call(this);
}
This method will return a giant array of _ids. Because they are already indexed, it's actually pretty fast, but it's still calling the db many more times than is necessary. When this method calls back, you can do an update with the ids, like this:
Model.update(ids,{'foo':'bar'},{multi:true},function(err){ console.log('hooray, more than 100 records updated.'); });
This isn't the most elegant way to solve this problem, you can tune it's efficiency by setting the batch based on expected results, but obviously the ability to simply call update (or find for that matter) on $near queries without a limit would really help.

Related

MongoDB - Use results of multiple fetch queries in one update query

I am building a platform for students to give mock test on. Once the test is complete, a results are to be generated for them relative to other students who attempted the said test.
Report contains multiple parameters i.e. rank, rank within their batch, and stuff like average marks people got on the given test are updated.
To get each of this data, I need to perform a separate query on the database and then I got to update 1. the result of current user who attempted the test 2. the result of everyone else (i.e. everyone's rank changes on new attempts)
So I need to perform multiple queries to get the data and run 2-3 update queries to set the new data.
Given mongodb calls are asynchronous, I can't find a way to gather all of that data at one place to be updated.
One way is to put the next query within the callback function of the previous query but I feel like there should be a better way than that.
Maybe you could use Promise.all()
Example:
const initialQueries = []
initialQueries.push("Some promise(s)")
Promise.all(initialQueries).then(results => {
// After all initialQueries are finished
updateQueries()
}).catch(err => {
// At least one failed
})
Use db.collection.bulkWrite.
It allows multiple document insertions, updates (by _id or a custom filter), and even deletes.
const ObjectID = require('mongodb').ObjectID;
db.collection('tests').bulkWrite([
{ updateOne : {
"filter" : { "_id" : ObjectID("5d400af131602bf3fa09da3a") },
"update" : { $set : { "score" : 20 } }
}
},
{ updateOne : {
"filter" : { "_id" : ObjectID("5d233e7831602bf3fa996557") },
"update" : { $set : { "score" : 15 } }
}
}
]);
New in version 3.2.

Upsert and $inc Sub-document in Array

The following schema is intended to record total views and views for a very specific day only.
const usersSchema = new Schema({
totalProductsViews: {type: Number, default: 0},
productsViewsStatistics: [{
day: {type: String, default: new Date().toISOString().slice(0, 10), unique: true},
count: {type: Number, default: 0}
}],
});
So today views will be stored in another subdocument different from yesterday. To implement this I tried to use upsert so as subdocument will be created each day when product is viewed and counts will be incremented and recorded based on a particular day. I tried to use the following function but seems not to work the way I intended.
usersSchema.statics.increaseProductsViews = async function (id) {
//Based on day only.
const todayDate = new Date().toISOString().slice(0, 10);
const result = await this.findByIdAndUpdate(id, {
$inc: {
totalProductsViews: 1,
'productsViewsStatistics.$[sub].count': 1
},
},
{
upsert: true,
arrayFilters: [{'sub.day': todayDate}],
new: true
});
console.log(result);
return result;
};
What do I miss to get the functionality I want? Any help will be appreciated.
What you are trying to do here actually requires you to understand some concepts you may not have grasped yet. The two primary ones being:
You cannot use any positional update as part of an upsert since it requires data to be present
Adding items into arrays mixed with "upsert" is generally a problem that you cannot do in a single statement.
It's a little unclear if "upsert" is your actual intention anyway or if you just presumed that was what you had to add in order to get your statement to work. It does complicate things if that is your intent, even if it's unlikely give the finByIdAndUpdate() usage which would imply you were actually expecting the "document" to be always present.
At any rate, it's clear you actually expect to "Update the array element when found, OR insert a new array element where not found". This is actually a two write process, and three when you consider the "upsert" case as well.
For this, you actually need to invoke the statements via bulkWrite():
usersSchema.statics.increaseProductsViews = async function (_id) {
//Based on day only.
const todayDate = new Date().toISOString().slice(0, 10);
await this.bulkWrite([
// Try to match an existing element and update it ( do NOT upsert )
{
"updateOne": {
"filter": { _id, "productViewStatistics.day": todayDate },
"update": {
"$inc": {
"totalProductsViews": 1,
"productViewStatistics.$.count": 1
}
}
}
},
// Try to $push where the element is not there but document is - ( do NOT upsert )
{
"updateOne": {
"filter": { _id, "productViewStatistics.day": { "$ne": todayDate } },
"update": {
"$inc": { "totalProductViews": 1 },
"$push": { "productViewStatistics": { "day": todayDate, "count": 1 } }
}
}
},
// Finally attempt upsert where the "document" was not there at all,
// only if you actually mean it - so optional
{
"updateOne": {
"filter": { _id },
"update": {
"$setOnInsert": {
"totalProductViews": 1,
"productViewStatistics": [{ "day": todayDate, "count": 1 }]
}
}
}
])
// return the modified document if you really must
return this.findById(_id); // Not atomic, but the lesser of all evils
}
So there's a real good reason here why the positional filtered [<identifier>] operator does not apply here. The main good reason is the intended purpose is to update multiple matching array elements, and you only ever want to update one. This actually has a specific operator in the positional $ operator which does exactly that. It's condition however must be included within the query predicate ( "filter" property in UpdateOne statements ) just as demonstrated in the first two statements of the bulkWrite() above.
So the main problems with using positional filtered [<identifier>] are that just as the first two statements show, you cannot actually alternate between the $inc or $push as would depend on if the document actually contained an array entry for the day. All that will happen is at best no update will be applied when the current day is not matched by the expression in arrayFilters.
The at worst case is an actual "upsert" will throw an error due to MongoDB not being able to decipher the "path name" from the statement, and of course you simply cannot $inc something that does not exist as a "new" array element. That needs a $push.
That leaves you with the mechanic that you also cannot do both the $inc and $push within a single statement. MongoDB will error that you are attempting to "modify the same path" as an illegal operation. Much the same applies to $setOnInsert since whilst that operator only applies to "upsert" operations, it does not preclude the other operations from happening.
Thus the logical steps fall back to what the comments in the code also describe:
Attempt to match where the document contains an existing array element, then update that element. Using $inc in this case
Attempt to match where the document exists but the array element is not present and then $push a new element for the given day with the default count, updating other elements appropriately
IF you actually did intend to upsert documents ( not array elements, because that's the above steps ) then finally actually attempt an upsert creating new properties including a new array.
Finally there is the issue of the bulkWrite(). Whilst this is a single request to the server with a single response, it still is effectively three ( or two if that's all you need ) operations. There is no way around that and it is better than issuing chained separate requests using findByIdAndUpdate() or even updateOne().
Of course the main operational difference from the perspective of code you attempted to implement is that method does not return the modified document. There is no way to get a "document response" from any "Bulk" operation at all.
As such the actual "bulk" process will only ever modify a document with one of the three statements submitted based on the presented logic and most importantly the order of those statements, which is important. But if you actually wanted to "return the document" after modification then the only way to do that is with a separate request to fetch the document.
The only caveat here is that there is the small possibility that other modifications could have occurred to the document other than the "array upsert" since the read and update are separated. There really is no way around that, without possibly "chaining" three separate requests to the server and then deciding which "response document" actually applied the update you wanted to achieve.
So with that context it's generally considered the lesser of evils to do the read separately. It's not ideal, but it's the best option available from a bad bunch.
As a final note, I would strongly suggest actually storing the the day property as a BSON Date instead of as a string. It actually takes less bytes to store and is far more useful in that form. As such the following constructor is probably the clearest and least hacky:
const todayDate = new Date(new Date().setUTCHours(0,0,0,0))

reduce output must shrink more rapidly, on adding new document

I have couple of documents in couchdb, each having a cId field, such as -
{
"_id": "ccf8a36e55913b7cf5b015d6c50009f7",
"_rev": "8-586130996ad60ccef54775c51599e73f",
"cId": 1,
"Status": true
}
I have a simple view, which tries to return max of cId with map and reduce functions as follows -
Map
function(doc) {
emit(null, doc.cId);
}
Reduce
function(key, values, rereduce){
return Math.max.apply(null, values);
}
This works fine (output is 1) until I add one more document with cId = 2 in db. I am expecting output as 2 but it starts giving error as "Reduce output must shrink more rapidly". When I delete this document things are back to normal again. What can be the issue here? Is there any alternative way to achieve this?
Note: There are more views in db, which perform different role and few return json as well. They also start failing on this change.
You could simply use the built-in _statsreduce function, in order to get the maximum value. It is returned in the "max" field.

Mongodb, incrementing value inside an array. (save() ? update() ?)

var Poll = mongoose.model('Poll', {
title: String,
votes: {
type: Array,
'default' : []
}
});
I have the above schema for my simple poll, and I am uncertain of the best method to change the value of the elements in my votes array.
app.put('/api/polls/:poll_id', function(req, res){
Poll.findById(req.params.poll_id, function(err, poll){
// I see the official website of mongodb use something like
// db.collection.update()
// but that doesn't apply here right? I have direct access to the "poll" object here.
Can I do something like
poll.votes[1] = poll.votes[1] + 1;
poll.save() ?
Helps much appreciated.
});
});
You can to the code as you have above, but of course this involves "retrieving" the document from the server, then making the modification and saving it back.
If you have a lot of concurrent operations doing this, then your results are not going to be consistent, as there is a high potential for "overwriting" the work of another operation that is trying to modify the same content. So your increments can go out of "sync" here.
A better approach is to use the standard .update() type of operations. These will make a single request to the server and modify the document. Even returning the modified document as would be the case with .findByIdAndUpdate():
Poll.findByIdAndUpdate(req.params.poll_id,
{ "$inc": { "votes.1": 1 } },
function(err,doc) {
}
);
So the $inc update operator does the work of modifying the array at the specified position using "dot notation". The operation is atomic, so no other operation can modify at the same time and if there was something issued just before then the result would be correctly incremented by that operation and then also by this one, returning the correct data in the result document.

How do I $inc an entire object full of properties without building the query with a loop?

I have a collection of documents for the form:
{ name:String, groceries:{ apples:Number, cherries:Number, prunes:Number } }
Now, every query I have to increment with positive and/or negative values for each element in "groceries". It is not important what keys or how many, I just added some examples.
I could do a :
var dataToBeIncremented = stuff;
var $inc = {};
for each( var index in dataToBeIncremented )
{
$inc[ "groceries." + index ] = dataToBeIncremented[ index ];
}
then
db.update( { _id:targetID }, { $inc : query } )
however, I might have thousands of grocery elements and find doing this loop at each update to be ugly and unoptimized.
I would like to know how to void this or why it can't be optimized.
Actually there is no way to avoid it, because there is no such command that can increment all the values inside the subdocument.
So the only way to do it is to do something like you have done:
{
"$inc": {
"groceries.apples" : 1,
"groceries.cherries" : 1,
"groceries.prunes" : 1
}
}
Because you do not know what are the fields exactly, you need to find them beforehand and to create the $inc statement. There is one good thing about these updates: no matter how may elements do you have, you will still need only 2 queries (find what to update and to actually perform update).
I was also thinking how to achieve a better results with a different schema, but apparently you have to cope with what you have.

Resources