CSV to Mongo using mongoose schema - node.js

I'm attempting to get a CSV file to my mongodb collection (via mongoose) while checking for matches at each level of my schema.
So for a given schema personSchema with a nest schema carSchema:
repairSchema = {
date: Date,
description: String
}
carSchema = {
make: String,
model: String
}
personSchema = {
first_name: String,
last_name: String,
car: [carSchema]
}
and an object that I am mapping the CSV data to:
mappingObject = {
first_name : 0,
last_name: 1,
car : {
make: 2,
model: 3,
repair: {
date: 4,
description: 5
}
}
}
check my collection for a match then check each nested schema for a match or create the entire document, as appropriate.
Desired process:
I need to check if a person document matching first_name and last_name exists in my collection.
If such a person document exists, check if that person document contains a matching car.make and car.model.
If such a car document exists, check if that car document contains a matching car.repair.date and car.repair.description.
If such a repair document exists, do nothing, exact match to existing record.
If such a repair document does not exist, push this repair to the repair document for the appropriate car and person.
If such a car document does does not exist, push this car to the car document for the appropriate person.
If such a person document does not exist, create the document.
The kicker
This same function will be used across many schemas, which may be nested many levels deep (current database has one schema that goes 7 levels deep). So it has to be fairly abstract. I can already get the data into the structure I need as a javascript object, so I just need to get from that object to the collection as described.
It also has to be synchronous, since multiple records from the CSV could have the same person, and asynchronous creation could mean that the same person gets created twice.
Current solution
I run through each line of the CSV, map the data to my mappingObject, then step through each level of the object in javascript, checking non-object key-value pairs for a match using find, then pushing/creating or recursing as appropriate. This absolutely works, but it is painfully slow with such large documents.
Here's my full recursing function, which works:
saveObj is the object that I've mapped the CSV on to that matches my schema.
findPrevObj is initially false. path and topKey both are initially "".
lr is the line reader object, lr.resume simply moves on to the next line.
var findOrSave = function(saveObj, findPrevObj, path, topKey){
//the object used to search the collection
var findObj = {};
//if this is a nested schema, we need the previous schema search to match as well
if (findPrevObj){
for (var key in findPrevObj){
findObj[key] = findPrevObj[key];
}
}
//go through all the saveObj, compiling the findObj from string fields
for (var key in saveObj){
if (saveObj.hasOwnProperty(key) && typeof saveObj[key] === "string"){
findObj[path+key] = saveObj[key]
}
}
//search the DB for this record
ThisCollection.find(findObj).exec(function(e, doc){
//this level at least exists
if (doc.length){
//go through all the deeper levels in our saveObj
for (var key in saveObj){
var i = 0;
if (saveObj.hasOwnProperty(key) && typeof saveObj[key] === "string"){
i += 1;
findOrSave(saveObj[key], findObj, path+key+".", path+key);
}
//if there were no deeper levels (basically, full record exists)
if (!i){
lr.resume();
}
}
//this level doesn't exist, add new record or push to array
} else {
if (findPrevObj){
var toPush = {};
toPush[topKey] = saveObj;
ThisCollection.findOneAndUpdate(
findPrevObj,
{$push: toPush},
{safe: true, upsert: true},
function(err, doc) {
lr.resume();
}
)
} else {
// console.log("\r\rTrying to save: \r", saveObj, "\r\r\r");
ThisCollection.create(saveObj, function(e, doc){
lr.resume();
});
}
}
});
}

I'll update for clarity, but the person.find is to check if a person with a matching first and last name exists. If they do exist, I check each car for a match - if the car exists already, there's no reason to add this record. If the car doesn't exist, I push it to the car array for the matching person. If no person was matched, I'd save the entire new record.
Ah, what you want is to update with upsert:
replace
Person.find({first_name: "adam", last_name: "snider"}).exec(function(e, d){
//matched? check {first_name: "adam", last_name: "snider", car.make: "honda", car.model: "civic"}
//no match? create this record (or push to array if this is a nested array)
});
with
Person.update(
{first_name: "adam", last_name: "snider"},
{$push: {car: {make: 'whatever', model: 'whatever2'}}},
{upsert: true}
)
If a match is found, it will push into OR create the car field this subdoucment: {car_make: 'whatever', car_model: 'whatever2'}.
If a match is not found, it will create a new doc that looks like:
{first_name: "adam", last_name: "snider", car: {car_make: 'whatever', car_model: 'whatever2'}}
This cuts your total db round trips in half. However, for even more efficiency, you can use an orderedBulkOperation. This would result in a single round trip to the database.
Here's what that would look like (using es6 here for concision...not a necessity):
const bulk = Person.collection.initializeOrderedBulkOp();
lr.on('line', function(line) {
const [first_name, last_name, make, model, repair_date, repair_description] = line.split(',');
// Ensure user exists
bulk.update({first_name, last_name}, {first_name, last_name}, {upsert: true});
// Find a user with the existing make and model. This makes sure that if the car IS there, it matches the proper document structure
bulk.update({first_name, last_name, 'car.make': make, 'car.model': model}, {$set: {'car.$.repair.date': repair_date, 'car.$.repair.description': repair_description}});
// Now, if the car wasn't there, let's add it to the set. This will not push if we just updated because it should match exactly now.
bulk.update({first_name, last_name}, {$addToSet: {car: {make, model, repair: {date: repair_date, description: repair_description}}}})
});

Related

findByIdAndUpdate adding another document to the database rather than updating

I am using MEAN stack for patient CRUD operations. The update does not seem to be working properly. It adds another document to the database with the updated info but with a null id and leaves the old document that is supposed to be updated as is.
below is the code I wrote in the service for update patient
editPatient(id:string,patient: Patient){
const headers = { 'content-type': 'application/json'}
const body=patient;
console.log(body)
let url=environment.PATIENT_BASE_URL+environment.PATIENT.UPDATE_PATIENT + "?userId=" +id;
return this.httpClient.put(url, body);
}
Those are the contents of the environment file
export const environment = {
production: false,
BASE_URL:'http://localhost:3000',
PATIENT_BASE_URL:'http://localhost:3000/patients/',
PATIENT:{
GET_ALL_PATIENTS: 'list',
GET_PATIENT: 'view',
UPDATE_PATIENT: 'update',
DELETE_PATIENT: 'delete',
SEARCH_PATIENT: 'search',
ADD_PATIENT: 'add',
}
};
This is the code in patients.js
router.put('/update', function(req, res, next) {
const userId = req.body.userId;
let firstnameVal = req.body.firstName;
let lastnameVal = req.body.lastName;
let usernameVal = req.body.username;
let emailVal = req.body.email;
let birthDateVal = req.body.birthDate;
let genderVal = req.body.gender;
let patientObj = {
firstName: firstnameVal,
lastName: lastnameVal,
username: usernameVal,
email: emailVal,
birthDate : birthDateVal,
gender: genderVal
};
// patientsModel.update({'gender':'female'}, )
patientsModel.findByIdAndUpdate(userId, patientObj,{upsert: true, new: true} ,function(err, patientResponse){
if(err){
res.send({status:500, message: 'Unable to update the patient'});
}
else{
res.send({status:200, message: 'User updated successfully' ,results: patientResponse});
}
});
});
Because you used this option
upsert: true
If item with id not found it creates a new document
you can read the docs here
Using the upsert option, you can use findOneAndUpdate() as a
find-and-upsert operation. An upsert behaves like a normal
findOneAndUpdate() if it finds a document that matches filter. But, if
no document matches filter, MongoDB will insert one by combining
filter and update as shown below.
The second argument to findByIdAndUpdate is an update object. If it does not contain any update operators, it is treated as a replacement document.
If your intent is to replace the entire document so the only fields it contains are the ones provided in this function, add the _id to the object:
let patientObj = {
_id: new mongoose.types.ObjectId(userId),
firstName: firstnameVal,
...
If the intent is to modify the provided fields but leave any others fields alone, use the $set update operator like
patientsModel.findByIdAndUpdate(userId, {"$set": patientObj}, ...
There are a few problems with your code - as others have said:
the use of upsert: true is suspicious - I can't imagine when you'd want to upsert this, and
the lack of $set is also unusual unless the patientObj represents the entire document you wish to set
both these items are causing you issues, but I suspect your main problem is actually that your ID doesn't match anything.
You mention an auto-generated ID. Mongo uses an ObjectId (though mongoose perhaps does not) - depending on how you serialise this value, the string representation of it would probably look like this: 63b310df2b36d95e156a237d - however when you query for that value (as you do with userId) - it will return no matches, since you need to convert it to an object ID:
userId = new mongoose.types.ObjectId(req.body.userId)
You should also fix items 1 and 2 above.

How to generate a unique 6 digits number to use as an ID for documents in a collection?

I have a collection of documents which are being added as a result of users' interactions.
Those docs already have an _id field, but I also wanna add a unique human readable ID for every existing and newly created object, in a form of D123456
What is the best way of adding such an ID and being sure that all those IDs are unique?
MongoDB doesn't have an auto-increment option like relational databases.
You can implement something yourself: before you save your document, generate an ID. First, create a database collection whose sole purpose is to hold a counter:
const Counter = mongoose.model('Counter', new mongoose.schema({
current: Number
}));
Second, before you save your object, find and increment the number in the collection:
const humanReadableDocumentId = await Counter.findOneAndUpdate(
// If you give this record a name, you can have multiple counters.
{ _id: 'humanReadableDocumentId' },
{ $inc: { current: 1 } },
// If no record exists, create one. Return the new value after updating.
{ upsert: true, returnDocument: 'after' }
);
const yourDocument.set('prettyId', format(humanReadableDocumentId.current));
function format(id) {
// Just an example.
return 'D' + id.toString().padStart(6, '0');
}
Note: I've tested the query in MongoDB (except for the 'returnDocument' option, which is Mongoose-specific, but this should work)
Formatting is up to you. If you have more than 999999 documents, the 'nice looking ID' in the example will just get longer and be 7+ characters.

Mongoose Append in subdocument or inside the array in the subdocument itself

This is how my Mongoose db looks like
_id: ObjectId("5c7d6b0b54795c02a6a5cb16")
people:[
{
_id: ObjectId('7c7d6b0b54795c02a6aa878')
name: John
bizs:[
{name:"Shop A"},
{name:"Shop B"}
]
},
{
_id: ObjectId('7c7d6b0b54795c02638b9cd')
name: Mary
bizs:[
{name:"Shop X"}
]
}
]
If I add a person who is already in the database (say for example John) with his business, I need the business to append itself in the bizs array. Eg Shop A,ShopB, Shop C.
But if I add another person who is not in the database (like how I have added Mary) I need it to create a whole new object in the people array.
I have tried doing this with upsert but it does'nt give me what I want.
How do I get something like this?
You first need to find if there exist a person with name "John"(as per your example),this can be done using a simple find query.if the result of that query is null or undefined then you can insert a new object(a new record into people array) else you can append it to the found document and the save it.
const name = req.body.name;
Schema.findOne({"people.name" : name} , (err , user)=>{
if(err){
//handle
}
if(!user){
//create a new record and push it to people array and save it.
}
else{
//create a record(with your object) and push it to people array and save
user.people.push();
user.save()
}
});

Easy way to only allow one item per user in mongoose schema array

I'm trying to implement a rating system and I'm struggling to only allow one rating per user in a reasonable way.
Simply put, i have an array of ratings in my schema, containing the "rater" and the rating, as such:
var schema = new Schema({
//...
ratings: [{
by: {
type: Schema.Types.ObjectId
},
rating: {
type: Number,
min: 1,
max: 5,
validate: ratingValidator
}
}],
//...
});
var Model = mongoose.model('Model', schema);
When i get a request, i wish to add the users rating to the array if the user has not already voted this document, otherwise i wish to update the rating (you should not be able to give more than one rating)
One way to do this is to find the document, "loop through" the array of ratings and search for the user. If the user has got already a rating in the array, the rating is changed, otherwise a new rating is pushed. As such:
Model.findById(id)
.select('ratings')
.exec(function(err, doc) {
if(err) return next(err);
if(doc) {
var rated = false;
var ratings = doc.ratings;
for(var i = 0; i < ratings.length; i++) {
if(ratings[i].by === user.id) {
ratings[i].rating = rating;
rated = true;
break;
}
}
if(!rated) {
ratings.push({
by: user.id,
rating: rating
});
}
doc.markModified('ratings');
doc.save();
} else {
//Not found
}
});
Is there an easier way? A way to let mongodb do this automatically?
The mongodb $addToSet operator could be an alternative, however i have not managed to use it for this, since that could allow two ratings with different scores from the same user.
As you note the $addToSet operator will not work in this case as indeed a userId with a different vote value would be a different value and it's own unique member of the set.
So the best way to do this is to actually issue two update statements with complementary logic. Only one will actually be applied depending on the state of the document:
async.series(
[
// Try to update a matching element
function(callback) {
Model.update(
{ "_id": id, "ratings.by": user.id },
{ "$set": { "ratings.$.rating": rating } },
callback
);
},
// Add the element where it does not exist
function(callback) {
Model.update(
{ "_id": id, "ratings.by": { "$ne": user.id } },
{ "$push": { "ratings": { "by": user.id, "rating": rating } }},
callback
);
}
],
function(err,result) {
// all done
}
);
The principle is simple, try to match the userId present in the ratings array for the document and update the entry. If that condition is not met then no document is updated. In the same way, try to match the document where there is no userId present in the ratings array, if there is a match then add the element, otherwise there will be no update.
This does bypass the built in schema validation of mongoose, so you would have to apply your constraints manually ( or inspect the schema validation rules and apply manually ) but it is better than you current approach in one very important aspect.
When you .find() the document and call it back to your client application to modify using code as you are, then there is no guarantee that the document has not changed on the server from another process or request. So when you issue .save() the document on the server may no longer be in the state that it was when it was read and any modifications can overwrite the changes made there.
Hence while there are two operations to the server and not one ( and your current code is two operations anyway ), it is the lesser of two evils to manually validate than to possibly cause a data inconsistency. The two update approach will respect any other updates issued to the document possibly occurring at the same time.

find id of latest subdocument inserted in mongoose

i have a model schema as :
var A = new Schema ({
a: String,
b : [ { ba: Integer, bb: String } ]
}, { collection: 'a' } );
then
var M = mongoose.model("a", A);
var saveid = null;
var m = new M({a:"Hello"});
m.save(function(err,model){
saveid = model.id;
}); // say m get the id as "1"
then
m['b'].push({ba:235,bb:"World"});
m.save(function(err,model){
console.log(model.id); //this will print 1, that is the id of the main Document only.
//here i want to find the id of the subdocument i have just created by push
});
So my question is how to find the id of the subdocument just pushed in one field of the model.
I've been looking for this answer as well, and I'm not sure that I like accessing the last document of the array. I do have an alternative solution, however. The method m['b'].push will return an integer, 1 or 0 - I'm assuming that is based off the success of the push (in terms of validation). However, in order to get access to the subdocument, and particularly the _id of the subdocument - you should use the create method first, then push.
The code is as follows:
var subdoc = m['b'].create({ ba: 234, bb: "World" });
m['b'].push(subdoc);
console.log(subdoc._id);
m.save(function(err, model) { console.log(arguments); });
What is happening is that when you pass in the object to either the push or the create method, the Schema cast occurs immediately (including things like validation and type casting) - this means that this is the time that the ObjectId is created; not when the model is saved back to Mongo. In fact, mongo does not automatically assign _id values to subdocuments this is a mongoose feature. Mongoose create is documented here: create docs
You should also note therefore, that even though you have a subdocument _id - it is not yet in Mongo until you save it, so be weary of any DOCRef action that you might take.
The question is "a bit" old, but what I do in this kind of situation is generate the subdocument's id before inserting it.
var subDocument = {
_id: mongoose.Types.ObjectId(),
ba:235,
bb:"World"
};
m['b'].push(subDocument);
m.save(function(err,model){
// I already know the id!
console.log(subDocument._id);
});
This way, even if there are other database operations between the save and the callback, it won't affect the id already created.
Mongoose will automatically create an _id for each new sub document, but - as far as I know - doesn't return this when you save it.
So you need to get it manually. The save method will return the saved document, including the subdocs. As you're using push you know it will be the last item in the array, so you can access it from there.
Something like this should do the trick.
m['b'].push({ba:235,bb:"World"});
m.save(function(err,model){
// model.b is the array of sub documents
console.log(model.b[model.b.length-1].id);
});
If you have a separate schema for your subdocument, then you can create the new subdocument from a model before you push it on to your parent document and it will have an ID:
var bSchema = new mongoose.Schema({
ba: Integer,
bb: String
};
var a = new mongoose.Schema({
a: String,
b : [ bSchema ]
});
var bModel = mongoose.model('b', bSchema);
var subdoc = new bModel({
ba: 5,
bb: "hello"
});
console.log(subdoc._id); // Voila!
Later you can add it to your parent document:
m['b'].push(subdoc)
m.save(...

Resources