Is there a way to set a fixed limit on the number of documents that can be inserted via a bulk insert in mongodb using the node.js client?
I am inserting a number of documents into a collection that has a unique index on fieldA via a bulk insert. Some of the inserts will fail due to fieldA being non-unique, so I can't know how many will be inserted beforehand, but I want to limit the nInserted so that the total of these documents never goes over 5000.
All I can think to do is to run the full insert and if nInserted brings the total above 5000 I remove the n last inserted documents such that the total is 5000 but this seems a bit silly.
The ordered bulk insert is almost right but I don't want it to stop on the first index conflict but keep going if there is still room (ie < 5000 total).
Here's an example of what I'm trying to achieve:
db.myCol.count({foo: val}, function(err, count) {
var remaining = 5000 - count;
if (remaining > 0) {
var bulk = db.myCol.initializeUnorderedBulkOp();
toInsert.forEach(function(item) {
bulk.insert(item);
});
// make sure no more than remaining is inserted
bulk.execute(function(err, result) {
// currently, I would just insert all and
// then remove the overflow with another db action
// if nInserted + count > 5000
});
}
});
Currently there is no way to tell the Bulk API to stop inserting any records once the limit of successful inserts has been reached.
One way of doing it in the client side,
Feed the Bulk API at most n(5000 in this case) documents at a
time
If any error has occurred during the insert, Bulk insert the
remaining.
Do it recursively.
You can further add logic to process only remaining number of
records if remaining < max.
Modified code:
var toInsert = [..]; // documents to be inserted.
var max = 5000; // max records for Bulk insert.
function execBulk(start,end){
db.myCol.count({foo: 'bar'}, function(err, count) {
var remaining = total - count;
if (remaining > 0 && toInsert.length > start) {
var bulk = db.myCol.initializeUnorderedBulkOp();
toInsert.slice(start,end).forEach(function(item) {
bulk.insert(item);
});
// insert the records
bulk.execute(function(err, result) {
if(err){
console.log(err);
// insert next set of at most 5000 records.
execBulk(end,end+max-1)
}
else
{
console.log(results);
}
});
}
});
}
Invoking the function:
execBulk(0,max);
Related
I am using nodejs and back-end DB as PostgreSQL and the ORM is SequalizeJS.
I want to add list of records into a table. The records are array of objects.
I am iterating this array and pushing each record into database . But sometimes , the order is not maintained.
Can you suggest some other way to fix this? . I want to add record one by one serially.
var users = <array of users>;
var createdUsers = [];
for (var Index = 0; Index < users.length; Index++) {
logger.debug("Insert User" + users[Index].user_name);
models.User.create(users[Index]).then(function (user) {
logger.debug("Inserted User" + user.user_name);
createdUsers.push(user);
if (createdUsers.length === users.length) {
response.status(200).json(createdUsers);
}
}).catch(function (error) {
response.status(500).json(error);
});
}
users contains [{user_name:"AAA"},{user_name:"BBB"},{user_name:"CCC"},{user_name:"DDD"},{user_name:"EEE"},{user_name:"FFF"}].
After insert , sometimes the order will be BBB,AAA,FFF,EEE,DDD.
I am implementing pagination.
What I am doing is counting the items from collection first & after that count will return. I am then doing another find with skip and limit.
Now I want to assign count to the data returned by 2nd query which I am not able to do.
I tried toObject() but I am getting error "toObject() is not a function".
I don't want to use any library.
ErrorReportModel.find().count(function(err, count) {
if (err) {
console.log(err);
utile.internalError(res);
}
ErrorReportModel.find().sort({
date_create: -1
}).limit(20).skip(query.page * 20).lean().exec(function(err, data) {
if (err) {
console.log(err);
utile.internalError(res);
}
// i am doing this at the moment
var myData = {};
myData.data=data;
myData.count = count;
utile.response(res, myData);
});
});
I want to send the count to client side because I want to display page_number buttons depending on that count.
I have a collection of Codes that I am populating from a CSV. There are a total of 1.5M codes, quite a few. The CSV is readily parsed into an object or the form:
codes = [
{code:'abc'},
{code:'123'},
etc
]
I initially tried writing this to Mongo in one insert, like so
Code.collection.insert(codes)
(using Mongoose to compose the query directly).
However this failed silently. Assuming some kind of hidden memory issue I began chunking my code, and found that Mongo 2.6 (running locally on my 16Gb Macbook, no replica sets) would accept around 1000 codes in a single insert.
Is this expected behaviour, and if so, is there any rationale to this number?
Try inserting using the Bulk Operation Methods, in particular you would need the db.collection.initializeOrderedBulkOp() method which can be exposed within Mongoose using the Model.collection accessor object. Thus, in the above you could restructure your inserts to do a bulk update as follows:
var bulk = Code.collection.initializeOrderedBulkOp(),
counter = 0;
codes.forEach( function(obj) ) {
bulk.find({'code': {'$ne': null}}/* some search */)
.update({'$set': {'code': obj.code}});
counter++;
if (counter % 1000 == 0) {
bulk.execute(function(err, result) {
bulk = Code.collection.initializeOrderedBulkOp();
});
}
}
if (counter % 1000 != 0 ) {
bulk.execute(function(err, result) {
// get stats
});
}
How to prevent re-entering an operation? For example:
var updateValue = function(cb) {
readValFromDB(function(err, val) {
if (err) { /* ... */ }
val = val + 1;
saveValToDB(function(err2, val) {
if (err2) { /* ... */ }
cb();
});
});
};
for(var i=0; i<100; i++) {
updateValue(function() {
});
}
If there is a value in a field in the database. The initial value is 0; After 100 times update, I hope the value in database will become 100.
while in node, after 100 times update, the value in database will be 1, 2, 3 or other random value. Because, readValFromDB() could have been executed many times before the first saveValToDB() is called. In other words, there is no serialization for this.
Is there a way in node to make sure the value could be 100 after 100 times update?
The problem you're experiencing is a race condition. The value is being read and written all at the same time.
You can do all the writes sequentially (see async#eachSeries).
Usually a better solution is to do the increment on the database server. See mongodb $inc operator and redis INCR operator as examples of an atomic increment.
If I query using mongoose and the result set coming back is 1 million records, are all those records stored in memory?
Can I iterate the results such that a cursor is used? Is this done automagically? If so is there a certain way I need to iterate the results?
Reports.find({}, function(err, reports) {
// Are all million reports stored in memory?
reports.forEach(function(report) {
// iterate through reports
});
// another way to iterate
for (var i = 0; i < reports.length; i++) {
var report = reports[i];
}
});