I am encountering a delay of 5 to 10 seconds from when the operation happens in MongoDB until I capture it in a Change Stream in NodeJS.
Are these times normal, what parameters could I check to see if any are impacting this?
Here are a couple of examples and some suspicions (to be tested).
Here we try to catch changes only in the fields of the Users collection that interest us, I do not know if doing this to avoid unwanted events may be causing delay in the reception of the ChangeStream and it would be convenient to receive more events and filter in code the updated fields.
I do not know, also if the "and" of the type of operation would have to be put before or it is irrelevant.
userChangeStreamQuery: [{
$match: {
$and: [
{$or:[
{ "updateDescription.updatedFields.name": { $exists: true } },
{ "updateDescription.updatedFields.email": { $exists: true } },
{ "updateDescription.updatedFields.organization": { $exists: true } },
{ "updateDescription.updatedFields.displayName": { $exists: true } },
{ "updateDescription.updatedFields.image": { $exists: true } },
{ "updateDescription.updatedFields.organizationName": { $exists: true } },
{ "updateDescription.updatedFields.locationName": { $exists: true } }
]},
{ operationType: "update" }]
}
}],
Of this other one, that waits for events on the Plans collection, I worry that it does not have aggregate defined and it is when receiving the event where it is filtered if the operation arrives type 'insert', 'update', 'delete'. This one is giving us a delay of 7~10 seconds.
startChangeStream({
streamId: 'plans',
collection: 'plans',
query: '',
resumeTokens
});
...
const startChangeStream = ({ streamId, collection, query, resumeTokens }) => {
const resumeToken = resumeTokens ? resumeTokens[streamId] || undefined : undefined;
nativeMongoDbFactory.setChangeStream({
streamId,
collection,
query,
resumeToken
});
}
In no case are massive operations, normally they are operations performed by the user through web forms.
when the collection is sharding, using change streams the mongos server need to wait until all shards have data to return, if some shards no data to write, the idle primary mongod writes a no-op to the oplog every 10 (idlewriteperiodms) seconds. that is why you delay is 7~10 seconds.
Related
I'm needing to update/save documents with sizes between 100KB - 800KB. Update operations like so, console.time('save'); await doc.findByIdAndUpdate(...).lean(); console.timeEnd('save');, are taking over 5s - 10s to finish. The updates contain ~50KB at most.
The large document property which is being updated has a structure like so:
{
largeProp: [{
key1: { key1A:val, key1B:val, ... 10 more ... },
key2: { key1A:val, key1B:val, ... 10 more ... },
key3: { key1A:val, key1B:val, ... 10 more ... },
...300 more...
}, ...100 more... ]
}
I'm using a Node.js server on Ubuntu VM with mongoose.js with MongoDB hosted on a separate server. The MongoDB server is does not show any unusual load, it usually stays under 7% CPU, however my Node.js server will hit 100% CPU usage with just this update operation (after a .findById() and some quick logic, 8ms-52ms). The .findById() takes about 500ms - 1s for this same object.
I need these saves to be much faster, and I don't understand why this is so slow.
I did not do much more profiling on the Mongoose query. Instead I tested out a native MongoDB query and it significantly improved the speed, so I will be using native MongoDB going forward.
const {ObjectId} = mongoose.Types;
let result = await mongoose.connection.collection('collection1')
.aggregate([
{ $match: { _id: ObjectId(gameId) } },
{ $lookup: {
localField:'field1',
from:'collection2',
foreignField:'_id',
as:'field1'
}
},
{ $unwind: '$field1' },
{ $project: {
_id: 1,
status: 1,
createdAt: 1,
slowArrProperty: { $slice: ["$positions", -1] } },
updatedAt: 1
}
},
{ $unwind: "$slowArrProperty" }
]).toArray();
if (result.length < 1) return {};
return result[0];
This query, as well as doing some restructuring of my data model solved my issue. Specifically, the document property that was very large and causing issues, I used the above { $slice: ["$positions", -1] } } to only return one of the objects in the array at a time.
Just from switching to native MongoDB queries (within the mongoose wrapper), I saw between 60x and 3000x improvements on query speeds.
I'm running on Mongo 3.6.6 (on a small Mongo Atlas cluster, not sharded) using the native Node JS driver (v. 3.0.10)
My code looks like this:
const records = await collection.find({
userId: ObjectId(userId),
status: 'completed',
lastUpdated: {
$exists: true,
$gte: '2018-06-10T21:24:12.000Z'
}
}).toArray();
I'm seeing this error occasionally:
{
"name": "MongoError",
"message": "cursor id 16621292331349 not found",
"ok": 0,
"errmsg": "cursor id 16621292331349 not found",
"code": 43,
"codeName": "CursorNotFound",
"operationTime": "6581469650867978275",
"$clusterTime": {
"clusterTime": "6581469650867978275",
"signature": {
"hash": "aWuGeAxOib4XWr1AOoowQL8yBmQ=",
"keyId": "6547661618229018626"
}
}
}
This is happening for queries that return a few hundred records at most. The records are a few hundred bytes each.
I looked online for what the issue might be but most of what I found is talking about cursor timeouts for very large operations that take longer than 10 minutes to complete. I can't tell exactly how long the failed queries took from my logs, but it's at most two seconds (probably much, much shorter than that).
I tested running the query with the same values as one that errored out and the execution time from explain was just a few milliseconds:
"executionStats" : {
"executionSuccess" : true,
"nReturned" : NumberInt(248),
"executionTimeMillis" : NumberInt(3),
"totalKeysExamined" : NumberInt(741),
"totalDocsExamined" : NumberInt(741),
"executionStages" : {...}
},
"allPlansExecution" : []
]
}
Any ideas? Could intermittent network latency cause this error? How would I mitigate that? Thanks
You can try these 3 things:
a) Set the cursor to false
db.collection.find().noCursorTimeout();
You must close the cursor at some point with cursor.close();
b) Or reduce the batch size
db.inventory.find().batchSize(10);
c) Retry when the cursor expires:
let processed = 0;
let updated = 0;
while(true) {
const cursor = db.snapshots.find().sort({ _id: 1 }).skip(processed);
try {
while (cursor.hasNext()) {
const doc = cursor.next();
++processed;
if (doc.stream && doc.roundedDate && !doc.sid) {
db.snapshots.update({
_id: doc._id
}, { $set: {
sid: `${ doc.stream.valueOf() }-${ doc.roundedDate }`
}});
++updated;
}
}
break; // Done processing all, exit outer loop
} catch (err) {
if (err.code !== 43) {
// Something else than a timeout went wrong. Abort loop.
throw err;
}
}
}
First of all, if your data is too big it's not a good idea to use toArray() method, instead it's better to use forEach() and loop throw the data.
Just like this :
const records = await collection.find({
userId: ObjectId(userId),
status: 'completed',
lastUpdated: {
$exists: true,
$gte: '2018-06-10T21:24:12.000Z'
}
});
records.forEach((record) => {
//do somthing ...
});
Second, you can use {allowDiskUse: true} option for getting large data.
const records = await collection.find({
userId: ObjectId(userId),
status: 'completed',
lastUpdated: {
$exists: true,
$gte: '2018-06-10T21:24:12.000Z'
}
},
{allowDiskUse: true});
is there please any way to perform this query in mongoose?
this multi update is possible from mongodb v2.6
{
update: <collection>,
updates:
[
{ q: <query>, u: <update>, upsert: <boolean>, multi: <boolean> },
{ q: <query>, u: <update>, upsert: <boolean>, multi: <boolean> },
{ q: <query>, u: <update>, upsert: <boolean>, multi: <boolean> },
...
],
ordered: <boolean>,
writeConcern: { <write concern> }
}
i found this topic, but its pretty old: Mongodb multi update with different value
thx everyone for suggetions
From the details that you provided, I assume that you would like to issue a series of update queries based on several different criteria and specific update values for each particular query.
Nevertheless, I will address both possible scenarios when it comes to updating multiple documents in MongoDB.
As I previously mentioned, if you would like to update multiple documents, there are two possible scenarios:
Update multiple documents that match one set of specific criteria, case in which you can use db.collection.update(), by specifying the multi parameter when you fire your operation Official MongoDB Docs
Use bulk.find.update() to chain several multi update operations and execute them in bulk Official MongoDB Docs
ok these are the results
var p_text_data = require("../public/import/p_100_k_text_cz.json"); //data with new vals
_und.map(p_text_data,function(vals,index) {
var new_name = vals.name + something;
e_textModel.collection.update({ 'id': vals.id }, { $set: { 'name': new_name } }, {upsert: false, multi: true});
});
using sinle update every iteration, updating 100k documents take about 7+ seconds
and next code, using bulk, take less than 3 seconds
var bulk = e_textModel.collection.initializeUnorderedBulkOp();
_und.map(p_text_data,function(vals,index) {
ite++;
var new_name = vals.name + something;
bulk.find( { 'id': vals.id } ).update( { $set: { 'name': new_name } } );
});
bulk.execute();
EDIT:
$var bulk = e_textModel.collection.initializeOrderedBulkOp();
is even couple times faster then UnorderedBulkOp in this case
I am in situation where I have to update either two documents or none of them, how is it possible to implement such behavior with mongo?
// nodejs mongodb driver
Bus.update({
"_id": { $in: [ObjectId("abc"), ObjectId("def")] },
"seats": { $gt: 0 }
}, {
$inc: { "seats": -1 }
}, { multi: true }, function(error, update) {
assert(update.result.nModified === 2)
})
The problem with code above it will update even if only one bus matched. In my case I try to book ticket for bus in both directions and should fail if at least one of them already fully booked.
Thank you
I am trying to aggregate some records in a mongo database using the node driver. I am first matching to org, fed, and sl fields (these are indexed). If I only include a few companies in the array that I am matching the org field to, the query runs fine and works as expected. However, when including all of the clients in the array, I always get:
MongoError: getMore: cursor didn't exist on server, possible restart or timeout?
I have tried playing with the allowDiskUse, and the batchSize settings, but nothing seems to work. With all the client strings in the array, the aggregation runs for ~5hours before throwing the cursor error. Any ideas? Below is the pipeline along with the actual aggregate command.
setting up the aggregation pipeline:
var aggQuery = [
{
$match: { //all clients, from last three days, and scored
org:
{ $in : array } //this is the array I am talking about
,
frd: {
$gte: _.last(util.lastXDates(3))
},
sl : true
}
}
, {
$group: { //group by isp and make fields for calculation
_id: "$gog",
count: {
$sum: 1
},
countRisky: {
$sum: {
$cond: {
if :{
$gte: ["$scr", 65]
},
then: 1,
else :0
}
}
},
countTimeZoneRisky: {
$sum: {
$cond: {
if :{
$eq: ["$gmt", "$gtz"]
},
then: 0,
else :1
}
}
}
}
}
, {
$match: { //show records with count >= 500
count: {
$gte: 500
}
}
}
, {
$project: { //rename _id to isp, only show relevent fields
_id: 0,
ISP: "$_id",
percentRisky: {
$multiply: [{
$divide: ["$countRisky", "$count"]
},
100
]
},
percentTimeZoneDiscrancy: {
$multiply: [{
$divide: ["$countTimeZoneRisky", "$count"]
},
100
]
},
count: 1
}
}
, {
$sort: { //sort by percent risky and then by count
percentRisky: 1,
count: 1
}
}
];
Running the aggregation:
var cursor = reportingCollections.hitColl.aggregate(aggQuery, {
allowDiskUse: true,
cursor: {
batchSize: 40000
}
});
console.log('Writing data to csv ' + currentFileNamePrefix + '!');
//iterate through cursor and write documents to CSV
cursor.each(function (err, document) {
//write each document to csv file
//maybe start a nuclear war
});
You're calling the aggregate method which doesn't return the cursor by default (like e.g. find()). To return query as a cursor, you must add the cursor option in the options. But, the timeout setting for the aggregation cursor is (currently) not supported. The native node.js driver only supports the batchSize setting.
You would set the batchOption like this:
var cursor = coll.aggregate(query, {cursor: {batchSize:100}}, writeResultsToCsv);
To circumvent such problems, I'd recommend aggregation or map-reduce directly through mongo client. There you can add the notimeout option.
The default timeout is 10 minutes (obviously useless for long time-consuming queries) and there's no way currently to set a different one as far as I know, only infinite by aforementioned option. The timeout hits you especially for high batch sizes, because it will take more than 10 mins to process the incoming docs and before you ask mongo server for more, the cursor has been deleted.
IDK your use case, but if it's a web view, there should be only fast queries/aggregations.
BTW I think this didn't change with 3.0.*