How can I speed up a mongoDB (mongoose) batch insert with nodejs? - node.js

I have a bunch of documents in a collection I need to copy and insert into the collection, changing only the parent_id on all of them. This is taking a very very long time and maxing out my CPU. This is the current implementation I have. I only need to change the parent_id on all the documents.
// find all the documents that need to be copied
models.States.find({parent_id: id, id: { $in: progress} }).exec(function (err, states) {
if (err) {
console.log(err);
throw err;
}
var insert_arr = [];
// copy every document into an array
for (var i = 0; i < states.length; i++) {
// copy with the new id
insert_arr.push({
parent_id: new_parent_id,
id: states[i].id,
// data is a pretty big object
data: states[i].data,
})
}
// batch insert
models.States.create(insert_arr, function (err) {
if (err) {
console.log(err);
throw err;
}
});
});
Here is the schema I am using
var states_schema = new Schema({
id : { type: Number, required: true },
parent_id : { type: Number, required: true },
data : { type: Schema.Types.Mixed, required: true }
});
There must be a better way to do this that I just cannot seem to come up with. Any suggestions are more than welcome! Thanks.

In such a case there is no point to do this on application layer. Just do this in database.
db.States.find({parent_id: id, id: { $in: progress} }).forEach(function(doc){
delete doc._id;
doc.parentId = 'newParentID';
db.States.insert(doc);
})
If you really need to do this in mongoose, I see the following problem:
your return all the documents that matches your criteria, then you iterate though them and copy them into another array (modifying them), then you iterate through modified elements and copy them back. So this is at least 3 times longer then what I am doing.
P.S. If you need to save to different collection, you should change db.States.insert(doc) to db.anotherColl.insert(doc)
P.S.2 If you can not do this from the shell, I hope you can find a way to insert my query into mongoose.

Related

Mongoose can't search by number field

I have a schema that has an id field that is set to a string. When I use collection.find({id: somenumber}) it returns nothing.
I've tried casting somenumber to a string and to a number. I've tried sending somenumber through as a regex. I've tried putting id in quotes and bare... I have no idea what's going on. Any help and input would be appreciated.
Toys.js
var Schema = mongoose.Schema;
var toySchema = new Schema( {
id: {type: String, required: true, unique: true},
name: {type: String, required: true},
price: Number
} );
My index.js is as such
app.use('/findToy', (req, res) => {
let query = {};
if (req.query.id)
query.id = req.query.id;
console.log(query);
// I've tried using the query variable and explicitly stating the object as below. Neither works.
Toy.find({id: '123'}, (err, toy) => {
if (!err) {
console.log("i'm right here, no errors and nothing in the query");
res.json(toy);
}
else {
console.log(err);
res.json({})
}
})
I know that there is a Toy in my mongoDB instance with id: '123'. If I do Toy.find() it returns:
[{"_id":"5bb7d8e4a620efb05cb407d2","id":"123","name":"Dog chew toy","price":10.99},
{"_id":"5bb7d8f7a620efb05cb407d3","id":"456","name":"Dog pillow","price":25.99}]
I'm at a complete loss, really.
This is what you're looking for. Visit the link for references, but here's a little snippet.
For the sake of this example, let's have a static id, even though Mongo creates a dynamic one [ _id ]. Maybe that what is the problem here. If you already a record in your DB with that id, there's no need for adding it manually, especially not the already existing one. Anyways, Drop your DB collection, and try out this simple example:
// Search by ObjectId
const id = "123";
ToyModel.findById(id, (err, user) => {
if(err) {
// Handle your error here
} else {
// If that 'toy' was found do whatever you want with it :)
}
});
Also, a very similar API is findOne.
ToyModel.findOne({_id: id}, function (err, toy) { ... });

MongoDB: how to insert a sub-document?

I am using sub-documents in my MEAN project, to handle orders and items per order.
These are my (simplified) schemas:
var itemPerOrderSchema = new mongoose.Schema({
itemId: String,
count: Number
});
var OrderSchema = new mongoose.Schema({
customerId: String,
date: String,
items: [ itemPerOrderSchema ]
});
To insert items in itemPerOrderSchema array I currently do:
var orderId = '123';
var item = { itemId: 'xyz', itemsCount: 7 };
Order.findOne({ id: orderId }, function(err, order) {
order.items.push(item);
order.save();
});
The problem is that I obviously want one item per itemId, and this way I obtain many sub-documents per item...
One solution could be to loop through all order.items, but this is not optimal, of course (order.items could me many...).
The same problem could arise when querying order.items...
The question is: how do I insert items in itemPerOrderSchema array without having to loop through all items already inserted on the order?
If you can use an object instead of array for items, maybe you can change your schema a bit for a single-query update.
Something like this:
{
customerId: 123,
items: {
xyz: 14,
ds2: 7
}
}
So, each itemId is a key in an object, not an element of the array.
let OrderSchema = new mongoose.Schema({
customerId: String,
date: String,
items: mongoose.Schema.Types.Mixed
});
Then updating your order is super simple. Let's say you want to add 3 of items number 'xyz' to customer 123.
db.orders.update({
customerId: 123
},
{
$inc: {
'items.xyz': 3
}
},
{
upsert: true
});
Passing upsert here to create the order even if the customer doesn't have an entry.
The downsides of this:
it is that if you use aggregation framework, it is either impossible to iterate over your items, or if you have a limited, known set of itemIds, then very verbose. You could solve that one with mapReduce, which can be a little slower, depending on how many of them you have there, so YMMB.
you do not have a clean items array on the client. You could fix that with either client extracting this info (a simple let items = Object.keys(order.items).map(key => ({ key: order.items[key] })); or with a mongoose virtual field or schema.path(), but this is probably another question, already answered.
First of all, you probably need to add orderId to your itemPerOrderSchema because the combination of orderId and itemId will make the record unique.
Assuming that orderId is added to the itemPerOrderSchema, I would suggest the following implementation:
function addItemToOrder(orderId, newItem, callback) {
Order.findOne({ id: orderId }, function(err, order) {
if (err) {
return callback(err);
}
ItemPerOrder.findOne({ orderId: orderId, itemId: newItem.itemId }, function(err, existingItem) {
if (err) {
return callback(err);
}
if (!existingItem) {
// there is no such item for this order yet, adding a new one
order.items.push(newItem);
order.save(function(err) {
return callback(err);
});
}
// there is already item with itemId for this order, updating itemsCount
itemPerOrder.update(
{ id: existingItem.id },
{ $inc: { itemsCount: newItem.itemsCount }}, function(err) {
return callback(err);
}
);
});
});
}
addItemToOrder('123', { itemId: ‘1’, itemsCount: 7 }, function(err) {
if (err) {
console.log("Error", err);
}
console.log("Item successfully added to order");
});
Hope this may help.

How to aggregate and group in mongoose

I have lot of accounts with each of them having an employee assigned. I want to find the number of accounts of each employee. How do I do this task using aggregate of mongoose(mongodb). I am familiar with other functions of mongoose and able to achieve with following code
exports.accountsOfEachEmployee = function(req, res) {
Account.find({active:true}).exec(function(err, accounts){
if (err || !accounts) res.status(400).send({
message: 'could not retrieve accounts from database'
});
var accountsOfEachEmployee = {};
for (var i = 0; i < accounts.length; i++) {
if(accountsOfEachEmployee[order[i].employee]) {
accountsOfEachEmployee[order[i].employee] = 1;
} else {
accountsOfEachEmployee[order[i].employee]++;
}
}
res.json(accountsOfEachEmployee);
});
};
Is using aggregate faster? How does grouping and aggregation work in mongoose(mongodb). Following is my schema of accounts
var AccountSchema = new Schema({
active: {
type : Boolean,
default: false
},
employee: {
type: Schema.ObjectId,
ref: 'Employee'
},
});
Aggregation is an faster than map reduce to get results in mongodb for simple queries. I am able to complete the above query with result and then group, count of mongodb. Following is the query I used later
Order.aggregate({$match: {active: true }},
{$group: {_id:'$employee', numberOfOrders: {$sum:1}}}, function(err, orders) {
res.json(orders);
});
Query is executed in 2 parts. First part is getting all the results which are active and then group them based on the value of employee along with getting a new field numberofOrders which is number of number of documents in each group formed when we grouped based on employee.

Hide embedded document in mongoose/node REST server

I'm trying to hide certain fields on my GET output for my REST server. I have 2 schema's, both have a field to embed related data from eachother into the GET, so getting /people would return a list of locations they work at and getting a list of locations returns who works there. Doing that, however, will add a person.locations.employees field and will then list out the employees again, which obviously I don't want. So how do I remove that field from the output before displaying it? Thanks all, let me know if you need any more information.
/********************
/ GET :endpoint
********************/
app.get('/:endpoint', function (req, res) {
var endpoint = req.params.endpoint;
// Select model based on endpoint, otherwise throw err
if( endpoint == 'people' ){
model = PeopleModel.find().populate('locations');
} else if( endpoint == 'locations' ){
model = LocationsModel.find().populate('employees');
} else {
return res.send(404, { erorr: "That resource doesn't exist" });
}
// Display the results
return model.exec(function (err, obj) {
if (!err) {
return res.send(obj);
} else {
return res.send(err);
}
});
});
Here is my GET logic. So I've been trying to use the query functions in mongoose after the populate function to try and filter out those references. Here are my two schema's.
peopleSchema.js
return new Schema({
first_name: String,
last_name: String,
address: {},
image: String,
job_title: String,
created_at: { type: Date, default: Date.now },
active_until: { type: Date, default: null },
hourly_wage: Number,
locations: [{ type: Schema.ObjectId, ref: 'Locations' }],
employee_number: Number
}, { collection: 'people' });
locationsSchema.js
return new Schema({
title: String,
address: {},
current_manager: String, // Inherit person details
alternate_contact: String, // Inherit person details
hours: {},
employees: [{ type: Schema.ObjectId, ref: 'People' }], // mixin employees that work at this location
created_at: { type: Date, default: Date.now },
active_until: { type: Date, default: null }
}, { collection: 'locations' });
You should specify the fields you want to fetch by using the select() method. You can do so by doing something like:
if( endpoint == 'people' ){
model = PeopleModel.find().select('locations').populate('locations');
} else if( endpoint == 'locations' ){
model = LocationsModel.find().select('employees').populate('employees');
} // ...
You can select more fields by separating them with spaces, for example:
PeopleModel.find().select('first_name last_name locations') ...
Select is the right answer but it also may help to specify it in your schema so that you maintain consistency in your API and I've found it helps me to not remember to do it everywhere I perform a query on the object.
You can set certain fields in your schema to never return by using the select: true|false attribute on the schema field.
More details can be found here: http://mongoosejs.com/docs/api.html#schematype_SchemaType-select
SOLUTION!
Because this was so hard for me to find i'm going to leave this here for anybody else. In order to "deselect" a populated item, just prefix the field with "-" in your select. Example:
PeopleModel.find().populate({path: 'locations', select: '-employees'});
And now locations.employee's will be hidden.
If you remember from you SQL days, SELECT does a restriction on the table(s) being queried. Restrict is one of the primitive operations from the relational model and continues to be a useful feature as the relational model has evolved. blah blah blah.
In mongoose, the Query.select() method allows you to perform this operation with some extra features. Particularly, not only can you specify what attributes (columns) to return, but you can also specify what attributes you want to exclude.
So here's the example:
function getPeople(req,res, next) {
var query = PeopleModel.find().populate({path: 'locations', select: '-employees'});
query.exec(function(err, people) {
// error handling stuff
// process and return response stuff
});
}
function getLocations(req,res, next) {
var query = LocationModel.find().populate({path: 'employees', select: '-locations'});
query.exec(function(err, people) {
// error handling stuff
// processing and returning response stuff
});
}
app.get('people', getPeople);
app.get('locations', getLocations);
Directly from the Mongoose Docs:
Go to http://mongoosejs.com/docs/populate.html and search for "Query conditions and other options"
Query conditions and other options
What if we wanted to populate our fans array based on their age,
select just their names, and return at most, any 5 of them?
Story
.find(...)
.populate({
path: 'fans',
match: { age: { $gte: 21 }},
select: 'name -_id',
options: { limit: 5 }
})
.exec()
I just wanted to remark, for the simplicity of the endpoint you may be able to get away with this way to define the endpoints. However, in general this kind of dispacher pattern is not necessary and may pose problems later in development when developing with Express.

Incorrect Subdocument Being Updated?

I've got a Schema with an array of subdocuments, I need to update just one of them. I do a findOne with the ID of the subdocument then cut down the response to just that subdocument at position 0 in the returned array.
No matter what I do, I can only get the first subdocument in the parent document to update, even when it should be the 2nd, 3rd, etc. Only the first gets updated no matter what. As far as I can tell it should be working, but I'm not a MongoDB or Mongoose expert, so I'm obviously wrong somewhere.
var template = req.params.template;
var page = req.params.page;
console.log('Template ID: ' + template);
db.Template.findOne({'pages._id': page}, {'pages.$': 1}, function (err, tmpl) {
console.log('Matched Template ID: ' + tmpl._id);
var pagePath = tmpl.pages[0].body;
if(req.body.file) {
tmpl.pages[0].background = req.body.filename;
tmpl.save(function (err, updTmpl) {
console.log(updTmpl);
if (err) console.log(err);
});
// db.Template.findOne(tmpl._id, function (err, tpl) {
// console.log('Additional Matched ID: ' + tmpl._id);
// console.log(tpl);
// tpl.pages[tmpl.pages[0].number].background = req.body.filename;
// tpl.save(function (err, updTmpl){
// if (err) console.log(err);
// });
// });
}
In the console, all of the ID's match up properly, and even when I return the updTmpl, it's saying that it's updated the proper record, even though its actually updated the first subdocument and not the one it's saying it has.
The schema just in case:
var envelopeSchema = new Schema({
background: String,
body: String
});
var pageSchema = new Schema({
background: String,
number: Number,
body: String
});
var templateSchema = new Schema({
name: { type: String, required: true, unique: true },
envelope: [envelopeSchema],
pagecount: Number,
pages: [pageSchema]
});
templateSchema.plugin(timestamps);
module.exports = mongoose.model("Template", templateSchema);
First, if you need req.body.file to be set in order for the update to execute I would recommend checking that before you run the query.
Also, is that a typo and req.body.file is supposed to be req.body.filename? I will assume it is for the example.
Additionally, and I have not done serious testing on this, but I believe your call will be more efficient if you specify your Template._id:
var template_id = req.params.template,
page_id = req.params.page;
if(req.body.filename){
db.Template.update({_id: template_id, 'pages._id': page_id},
{ $set: {'pages.$.background': req.body.filename} },
function(err, res){
if(err){
// err
} else {
// success
}
});
} else {
// return error / missing data
}
Mongoose doesn't understand documents returned with the positional projection operator. It always updates an array of subdocuments positionally, not by id. You may be interested in looking at the actual queries that mongoose is building - use mongoose.set('debug', true).
You'll have to either get the entire array, or build your own MongoDB query and go around mongoose. I would suggest the former; if pulling the entire array is going to cause performance issues, you're probably better off making each of the subdocuments a top-level document - documents that grow without bounds become problematic (at the very least because Mongo has a hard document size limit).
I'm not familiar with mongoose but the Mongo update query might be:
db.Template.update( { "pages._id": page }, { $set: { "pages.$.body" : body } } )

Resources