I am working on a Node.js + MongoDB application. The application inserts some records in the MongoDB. For example lets take below simple record:
{
"name": "Sachin",
"age" : 11,
"class": 5,
"percentage": 78,
"rating": 5
}
Now end user can set different rule for which they want to get the notification/alert when a specific condition is satisfied. For example we can have a rule like:
1) Rule1: Generate notification/alert if "percentage" is less than 40
In order to achieve this, I am using Replication and tailable cursor. So whenever a new record gets added in the collection I get an record in the tailable cursor.
coll = db.collection('oplog.rs');
options = {
tailable: true,
awaitdata: true,
numberOfRetries: -1
};
var qcond = {'o.data.percentage':{$gt:40}};
coll.find(qcond, options, function(err, cur) {
cur.each(function(err, doc) {
//Perform some operations on received document like
//adding it to other collection or generating alert
}); //cur.each
}); //find
Everything works fine till this point.
Now problem starts when enduser wants to add another rule at runtime say:
2) Rule2: Generate notification/alert if "rating" is greater than 8
Now I would like to consider this condition/rule as well when querying the tailable cursor. But the current cursor is already in a waiting state based on the conditions given as per Rule1 only.
Is there any way to update the query conditions dynamically so that I can include conditions for Rule2 as well?
I tried searching but couldn't find a way to achieve this.
Does anyone have any suggestion/pointers to tackle this situation?
No. You can't modify a cursor once it's open on the server. You'll need to terminate the cursor and reopen it to cover both conditions, or open a second cursor to cover the second condition.
Related
The following schema is intended to record total views and views for a very specific day only.
const usersSchema = new Schema({
totalProductsViews: {type: Number, default: 0},
productsViewsStatistics: [{
day: {type: String, default: new Date().toISOString().slice(0, 10), unique: true},
count: {type: Number, default: 0}
}],
});
So today views will be stored in another subdocument different from yesterday. To implement this I tried to use upsert so as subdocument will be created each day when product is viewed and counts will be incremented and recorded based on a particular day. I tried to use the following function but seems not to work the way I intended.
usersSchema.statics.increaseProductsViews = async function (id) {
//Based on day only.
const todayDate = new Date().toISOString().slice(0, 10);
const result = await this.findByIdAndUpdate(id, {
$inc: {
totalProductsViews: 1,
'productsViewsStatistics.$[sub].count': 1
},
},
{
upsert: true,
arrayFilters: [{'sub.day': todayDate}],
new: true
});
console.log(result);
return result;
};
What do I miss to get the functionality I want? Any help will be appreciated.
What you are trying to do here actually requires you to understand some concepts you may not have grasped yet. The two primary ones being:
You cannot use any positional update as part of an upsert since it requires data to be present
Adding items into arrays mixed with "upsert" is generally a problem that you cannot do in a single statement.
It's a little unclear if "upsert" is your actual intention anyway or if you just presumed that was what you had to add in order to get your statement to work. It does complicate things if that is your intent, even if it's unlikely give the finByIdAndUpdate() usage which would imply you were actually expecting the "document" to be always present.
At any rate, it's clear you actually expect to "Update the array element when found, OR insert a new array element where not found". This is actually a two write process, and three when you consider the "upsert" case as well.
For this, you actually need to invoke the statements via bulkWrite():
usersSchema.statics.increaseProductsViews = async function (_id) {
//Based on day only.
const todayDate = new Date().toISOString().slice(0, 10);
await this.bulkWrite([
// Try to match an existing element and update it ( do NOT upsert )
{
"updateOne": {
"filter": { _id, "productViewStatistics.day": todayDate },
"update": {
"$inc": {
"totalProductsViews": 1,
"productViewStatistics.$.count": 1
}
}
}
},
// Try to $push where the element is not there but document is - ( do NOT upsert )
{
"updateOne": {
"filter": { _id, "productViewStatistics.day": { "$ne": todayDate } },
"update": {
"$inc": { "totalProductViews": 1 },
"$push": { "productViewStatistics": { "day": todayDate, "count": 1 } }
}
}
},
// Finally attempt upsert where the "document" was not there at all,
// only if you actually mean it - so optional
{
"updateOne": {
"filter": { _id },
"update": {
"$setOnInsert": {
"totalProductViews": 1,
"productViewStatistics": [{ "day": todayDate, "count": 1 }]
}
}
}
])
// return the modified document if you really must
return this.findById(_id); // Not atomic, but the lesser of all evils
}
So there's a real good reason here why the positional filtered [<identifier>] operator does not apply here. The main good reason is the intended purpose is to update multiple matching array elements, and you only ever want to update one. This actually has a specific operator in the positional $ operator which does exactly that. It's condition however must be included within the query predicate ( "filter" property in UpdateOne statements ) just as demonstrated in the first two statements of the bulkWrite() above.
So the main problems with using positional filtered [<identifier>] are that just as the first two statements show, you cannot actually alternate between the $inc or $push as would depend on if the document actually contained an array entry for the day. All that will happen is at best no update will be applied when the current day is not matched by the expression in arrayFilters.
The at worst case is an actual "upsert" will throw an error due to MongoDB not being able to decipher the "path name" from the statement, and of course you simply cannot $inc something that does not exist as a "new" array element. That needs a $push.
That leaves you with the mechanic that you also cannot do both the $inc and $push within a single statement. MongoDB will error that you are attempting to "modify the same path" as an illegal operation. Much the same applies to $setOnInsert since whilst that operator only applies to "upsert" operations, it does not preclude the other operations from happening.
Thus the logical steps fall back to what the comments in the code also describe:
Attempt to match where the document contains an existing array element, then update that element. Using $inc in this case
Attempt to match where the document exists but the array element is not present and then $push a new element for the given day with the default count, updating other elements appropriately
IF you actually did intend to upsert documents ( not array elements, because that's the above steps ) then finally actually attempt an upsert creating new properties including a new array.
Finally there is the issue of the bulkWrite(). Whilst this is a single request to the server with a single response, it still is effectively three ( or two if that's all you need ) operations. There is no way around that and it is better than issuing chained separate requests using findByIdAndUpdate() or even updateOne().
Of course the main operational difference from the perspective of code you attempted to implement is that method does not return the modified document. There is no way to get a "document response" from any "Bulk" operation at all.
As such the actual "bulk" process will only ever modify a document with one of the three statements submitted based on the presented logic and most importantly the order of those statements, which is important. But if you actually wanted to "return the document" after modification then the only way to do that is with a separate request to fetch the document.
The only caveat here is that there is the small possibility that other modifications could have occurred to the document other than the "array upsert" since the read and update are separated. There really is no way around that, without possibly "chaining" three separate requests to the server and then deciding which "response document" actually applied the update you wanted to achieve.
So with that context it's generally considered the lesser of evils to do the read separately. It's not ideal, but it's the best option available from a bad bunch.
As a final note, I would strongly suggest actually storing the the day property as a BSON Date instead of as a string. It actually takes less bytes to store and is far more useful in that form. As such the following constructor is probably the clearest and least hacky:
const todayDate = new Date(new Date().setUTCHours(0,0,0,0))
I am trying to lazy load data from publicData using the options that BuildFire describes in the wiki. I have set up some code to test that it works and it seems that is does not any way that I configure the request options. Here is the code that I am using:
var loadSortedPages = function(page) {
var skip = page*50;
var options = {
"filter": {},
"sort": {"points": 1},
"pageSize": "50",
"skip": skip.toString()
}
buildfire.publicData.search(options, 'users', function(err, records) {
console.log("RECORDS SORTED ASCENDING BY POINTS FOR PAGE " + page, records);
});
}
loadSortedPages(0);
loadSortedPages(1);
loadSortedPages(2);
I have tried, it seems, every thinkable combination of "page" and "skip" both as different combinations of string and number values. Nothing works and I always get back the first 50 sorted records for each of the loadSortedPages calls even though I am passing in different page numbers. If this something on BuildFire's end?
Here is the documentation on how to use Datastore search https://github.com/BuildFire/sdk/wiki/How-to-use-Datastore#buildfiredatastoresearchoptions-tag-optional-callback
It seems like you are mixing to pagination methods:
and for pagination you can either use:
page : is number that determine the page that need to retrieve.
pageSize: is a number of record per page , the max value is 20.
Or use:
skip : is number of record that you need to skip.
limit: is a number
of record for this call, the max value is 20.
I have the following documents:
{
"_id": "538584aad48c6cdc3f07a2b3",
"startTime": "2014-06-12T21:30:00.000Z",
"endTime": "2014-06-12T22:00:00.000Z",
},
{
"_id": "538584b1d48c6cdc3f07a2b4",
"startTime": "2014-06-12T22:30:00.000Z",
"endTime": "2014-06-12T23:00:00.000Z",
}
All of them have startTime and endTime value. I need to maintain consistency that no two date spans in the collection overlap.
Let's say if I add the following document with the following dates:
db.collection.insert({
"startTime": "2014-06-12T19:30:00.000Z",
"endTime": "2014-06-12T21:00:00.000Z"
});
This date span insert should fail because it overlaps with an existing interval.
My questions are:
How to check for date span overlap?
How to check and insert with a single query?
EDIT: to prevent duplicate I ask there and start a bounty. I need to make update operation by using single query as described here: How to query and update document by using single query?
The query is not as complicated as it may look at first - the query to find all documents which "overlap" the range you are given is:
db.test.find( { "startTime" : { "$lt" : new_end_time },
"endTime" : { "$gt": new_start_time }
}
)
This will match any document with starting date earlier than our end date and end date greater than our start time. If you visualize the ranges as being points on a line:
-----|*********|----------|****|-----------|******||********|---
s1 e1 s2 e2 s3 e3s4 e4
the sX-eX pairs represent existing ranges. If you take a new s5-e5 you can see that if we eliminate pairs that start after our end date (they can't overlap us) and then we eliminate all pairs that end before our start date, if we have nothing left, then we are good to insert.
That condition would be does a union of all documents with end date $lte our start and those with start date $gte ours include all documents already in collection. Our query flips this around to make sure that no documents satisfy the opposite of this condition.
On the performance front, it's unfortunate that you are storing your dates as strings only. If you stored them as timestamps (or any number, really) you could make this query utilize indexes better. As it is, for performance you would want to have an index on { "startTime":1, "endTime":1 }.
It's simple to find whether the range you want to insert overlaps any existing ranges, but to your second question:
How to check and insert with a single query?
There is no way proper way to do it with an inserts since they do not take a query (i.e. they are not conditional).
However, you can use an updates with upsert condition. It can insert if the condition doesn't match anything, but if it does match, it will try to update the matched document!
So the trick you would use is make the update a noop, and set the fields you need on upsert only. Since 2.4 there is a $setOnInsert operator to update. The full thing would look something like this:
db.test.update(
{ startTime: { "$lt" : new_end_time }, "endTime" : { "$gt": new_start_time } },
{ $setOnInsert:{ startTime:new_start_time, endTime: new_end_time}},
{upsert:1}
)
WriteResult({
"nMatched" : 0,
"nUpserted" : 1,
"nModified" : 0,
"_id" : ObjectId("538e0f6e7110dddea4383938")
})
db.test.update(
{ startTime:{ "$lt" : new_end_time }, "endTime" : { "$gt": new_start_time } },
{ $setOnInsert:{ startTime:new_start_time, endTime: new_end_time}},
{upsert:1}
)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 0 })
I just did the same "update" twice - the first time, there was no overlap document(s) so the update performed an "upsert" which you can see in the WriteResult it returned.
When I ran it a second time, it would overlap (itself, of course) so it tried to update the matched document, but noticed there was no work to do. You can see the returned nMatched is 1 but nothing was inserted or modified.
This query should return all documents that somehow overlap with the new start/end-Time values.
db.test.find({"$or":[
{"$and":[{"startTime":{"$lte":"new_start_time"}, "endTime":{"$gte":"new_start_time"}}, //new time has an old startTime in the middle
{"startTime":{"$lte":"new_end_time"}, "endTime":{"$lte":"new_end_time"}}]},
{"$and":[{"startTime":{"$gte":"new_start_time"}, "endTime":{"$gte":"new_start_time"}}, //new time sorounds and old time
{"startTime":{"$lte":"new_end_time"}, "endTime":{"$lte":"new_end_time"}}]},
{"$and":[{"startTime":{"$gte":"new_start_time"}, "endTime":{"$gte":"new_start_time"}}, //an old time has the new endTime in the middle
{"startTime":{"$lte":"new_end_time"}, "endTime":{"$gte":"new_end_time"}}]},
{"$and":[{"startTime":{"$lte":"new_start_time"}, "endTime":{"$gte":"new_start_time"}}, //new time is within an old time
{"startTime":{"$lte":"new_end_time"}, "endTime":{"$gte":"new_end_time"}}]}
]})
You want to run both queries at the same time. It means you want Synchronous in your code visit this question it may help for your answer
Synchronous database queries with Node.js
I have a mongoDB collection with more then 1000000 documents and i would like to update each document one by one with a dedicated information (each doc has an information coming from an other collection).
Currently i'm using a cursor that fetch all the data from the collection and i do an update of each records through the async module of Node.js
Fetch all docs :
inst.db.collection(association.collection, function(err, collection) {
collection.find({}, {}, function(err, cursor) {
cursor.toArray(function(err, items){
......
);
});
});
update each doc :
items.forEach(function(item) {
// *** do some stuff with item, add field etc.
tasks.push(function(nextTask) {
inst.db.collection(association.collection, function(err, collection) {
if (err) callback(err, null);
collection.save(item, nextTask);
});
});
});
call the "save" task in parallel
async.parallel(tasks, function(err, results) {
callback(err, results);
});
Ho would you do this type of operation in a more efficient way? I mean how to avoid the initial "find" to load a cursor. Is there now way to do an operation doc by doc knowing that all docs should be updated?
Thanks for your support.
You're question inspired me to create a Gist to do some performance testing of different approaches to your problem.
Here are the results running on a small EC2 instance with the MongoDB at localhost. The test scenario is to uniquely operate on every document of a 100000 element collection.
108.661 seconds -- Uses find().toArray to pull in all the items at once then replaces the documents with individual "save" calls.
99.645 seconds -- Uses find().toArray to pull in all the items at once then updates the documents with individual "update" calls.
74.553 seconds -- Iterates on the cursor (find().each) with batchSize = 10, then uses individual update calls.
58.673 seconds -- Iterates on the cursor (find().each) with batchSize = 10000, then uses individual update calls.
4.727 seconds -- Iterates on the cursor with batchSize = 10000, and does inserts into a new collection 10000 items at a time.
Though not included, I also did a test with MapReduce used as a server side filter which ran at about 19 seconds. I would have liked to have similarly used "aggregate" as a server side filter, but it doesn't yet have an option to output to a collection.
The bottom line answer is that if you can get away with it, the fastest option is to pull items from an initial collection via a cursor, update them locally and insert them into a new collection in big chunks. Then you can swap in the new collection for the old.
If you need to keep the database active, then the best option is to use a cursor with a big batchSize, and update the documents in place. The "save" call is slower than "update" because it needs to replace whole document, and probably needs to reindex it as well.
I have some Mongoose Models with geospacial indexes:
var User = new Schema({
"name" : String,
"location" : {
"id" : String,
"name" : String,
"loc" : { type : Array, index : '2d'}
}
});
I'm trying to update all items that are in an area - for instance:
User.update({ "location.loc" : { "$near" : [ -122.4192, 37.7793 ], "$maxDistance" : 0.4 } }, { "foo" : "bar" },{ "multi" : true }, function(err){
console.log("done!");
});
However, this appears to only update the first 100 records. Looking at the docs, it appears there is a native limit on finds on geospatial indices for that applies when you don't set a limit.
(from docs:
Use limit() to specify a maximum number of points to return (a default limit of 100 applies if unspecified))
This appears to also apply to updates, regardless of the multi flag, which is a giant drag. If I apply an update, it only updates the first 100.
Right now the only way I can think of to get around this is to do something hideous like this:
Model.find({"location.loc" : { "$near" : [ -122.4192, 37.7793 ], "$maxDistance" : 0.4 } },{limit:0},function(err,results){
var ids = results.map(function(r){ return r._id; });
Model.update({"_id" : { $in : ids }},{"foo":"bar"},{multi:true},function(){
console.log("I have enjoyed crippling your server.");
});
});
While I'm not even entirely sure that would work (and it could be mildly optimized by only selecting the _id), I'd really like to avoid keeping an array of n ids in memory, as that number could get very large.
Edit:
The above hack doesn't even work, looks like a find with {limit:0} still returns 100 results. So, in an act of sheer desperation and frustration, I have written a recursive method to paginate through ids, then return them so I can update using the above method. I have added the method as an answer below, but not accepted it in hopes that someone will find a better way.
This is a problem in mongo server core as far as I can tell, so mongoose and node-mongodb-native are not to blame. However, this is really stupid, as geospacial indices is one of the few reasons to use mongo over some other more robust NoSQL stores.
Is there a way to achieve this? Even in node-mongodb-native, or the mongo shell, I can't seem to find a way to set (or in this case, remove by setting to 0) a limit on an update.
I'd love to see this issue fixed, but I can't figure out a way to set a limit on an update, and after extensive research, it doesn't appear to be possible. In addition, the hack in the question doesn't even work, I still only get 100 records with a find and limit set to 0.
Until this is fixed in mongo, here's how I'm getting around it: (!!WARNING: UGLY HACKS AHEAD:!!)
var getIdsPaginated = function(query,batch,callback){
// set a default batch if it isn't passed.
if(!callback){
callback = batch;
batch = 10000;
}
// define our array and a find method we can call recursively.
var all = [],
find = function(skip){
// skip defaults to 0
skip = skip || 0;
this.find(query,['_id'],{limit:batch,skip:skip},function(err,items){
if(err){
// if an error is thrown, call back with it and how far we got in the array.
callback(err,all);
} else if(items && items.length){
// if we returned any items, grab their ids and put them in the 'all' array
var ids = items.map(function(i){ return i._id.toString(); });
all = all.concat(ids);
// recurse
find.call(this,skip+batch);
} else {
// we have recursed and not returned any ids. This means we have them all.
callback(err,all);
}
}.bind(this));
};
// start the recursion
find.call(this);
}
This method will return a giant array of _ids. Because they are already indexed, it's actually pretty fast, but it's still calling the db many more times than is necessary. When this method calls back, you can do an update with the ids, like this:
Model.update(ids,{'foo':'bar'},{multi:true},function(err){ console.log('hooray, more than 100 records updated.'); });
This isn't the most elegant way to solve this problem, you can tune it's efficiency by setting the batch based on expected results, but obviously the ability to simply call update (or find for that matter) on $near queries without a limit would really help.