I have a program that receive a New file from FTP server after some specific time , I insert the new updated file's data into my database MongoDB the fields remain same , only data changes in new file.. Now the problem is that EVERY TIME I have to insert whole new collection into database, and the collection increases accordingly.Forexample- first time data is of 20 records , second time 40 and then 60 and so on .The thing I want to do is I want to check which field's data is updated in New file before inserting new FILE's data , and should only update these field's data in database instead of inserting whole new document.Does MONGOOSE or MONGODB provide solution for this , means IF I PASS A DATA AS PARAMETER , IT SOULD COMPARE MY EXISTING COLLECTION WITH MY NEW DATA and then Update Only UPDATED FIELDS ..Please help me i,m stuck , thanks :) . I,m using NODE JS ...
var c = new Client();
var connectionProperties = {
host: 'ABC',
user: 'ABC',
port: 'ABC',
password: 'ABC',
};
c.connect(connectionProperties);
c.on('ready', function () {
c.get('path-to-excel-file', function (err, stream) {
if (err) throw err;
stream.once('close', function () {
const workBook = XLSX.readFile('Convertedfile.xlsx');
XLSX.writeFile(workBook, 'Convertedfile', { bookType: "csv" });
csv()
.fromFile("Convertedfile")
.then((jsonObj) => {
Model.collection.insert(jsonObj, function (err, docs) {
if (err) {
return console.error(err);
} else {
console.log("All Documents insterted");
}
});
})
c.end()
});
stream.pipe(fs.createWriteStream('ConvertedFile.xlsx'))
})
})
It looks like you need upsert means update if document/record exists or insert/create it.
So this can be done either 1 document at a moment or in bulk, but it would need a query to find the matching document(s) first.
Since you've not provided any sample data so I can't write a sample code snippet for you but here is the link to get you started, for bulk: Bulk.find.upsert and for single document this thread is good : how-do-i-update-upsert-a-document-in-mongoose
Update: here is the mongodb bulk upsert in action:
const mongo = require('mongodb');
const MongoClient = mongo.MongoClient;
const client = new MongoClient('mongodb://127.0.0.1:27017', { useUnifiedTopology: true });
client.connect(async err => {
if (err) {
console.log('DB Connection Error ', err);
} else {
const db = client.db('node-cheat-db');
// lets assume you've read all file contents and data is now ready for db operation
let records = [
{first: 'john', last: 'doe', email: 'johen#doe.com'},
{first: 'jack', last: 'doe', email: 'jack#doe.com'},
{first: 'jill', last: 'doe_updated', email: 'jill#doe.com'}
];
// prepare bulk upsert so that new records are created and existing are updated
let bulk = db.collection('users').initializeOrderedBulkOp();
for (var i = 0; i < records.length; i++) {
bulk.find({
"email": records[i].email // at least 1 key should be treated as PK; in this example email is PK
}).upsert(records[i]).replaceOne(records[i]);
}
bulk.execute(function (err,updateResult) {
if (updateResult.result.ok != 1) {
console.log('Bulk Upsert Error');
} else {
console.log(`Inserted: ${updateResult.result.nUpserted} and Updated: ${updateResult.result.nModified}`);
}
});
}
});
sample output looks like:
Inserted: 0 and Updated: 3
Further Details:
Clone node-cheat bulk-update, run node bulk-update.js followed by npm install mongodb.
Related
I have to read a really large CSV file so search through the google and get to know about createReadStream. I am using a program that read the csv file data and insert it into the mongoDB.
process I am following
process the data using createReadStream (I think it read the file line by line).
Storing data into an array.
Insert the data into mongoDB using insertMany
Now the problem is whole file is first get stored into an array and then I insert into the database.
But what I think is the better approach would be I only store first 500 line/rows into an array insert it into the DB and again follow the same step for the next 500 records
Is it possible to achieve this ?
and also is it the right way to do this ?
my program
const test = async () => {
const stream = fs.createReadStream(workerData)
.pipe(parse())
.on('data', async function(csvrow) {
try{
stream.pause()
if(!authorName.includes(csvrow.author)) {
const author = new Author({author: csvrow.author})
authorId = author._id
authorName.push(author.author)
authorData.push(author)
}
if(!companyName.includes(csvrow.company_name)) {
const company = new Company({companyName: csvrow.company_name})
companyID = company._id
companyName.push(company.companyName)
companyData.push(company)
}
users = new User({
name: csvrow.firstname,
dob: csvrow.dob,
address: csvrow.address,
phone: csvrow.phone,
state: csvrow.state,
zip: csvrow.zip,
email: csvrow.email,
gender: csvrow.gender,
userType: csvrow.userType
})
userData.push(users)
book = new Book({
book_number: csvrow.book_number,
book_name: csvrow.book_name,
book_desc: csvrow.book_desc,
user_id: users._id,
author_id: authorId
})
bookData.push(book)
relationalData.push({
username: users.name,
author_id: authorId,
book_id: book._id,
company_id: companyID
})
}finally {
stream.resume()
}
})
.on('end', async function() {
try {
Author.insertMany(authorData)
User.insertMany(userData)
Book.insertMany(bookData)
Company.insertMany(companyData)
await Relational.insertMany(relationalData)
parentPort.postMessage("true")
}catch(e){
console.log(e)
parentPort.postMessage("false")
}
})
}
test()
This program is working fine also inserting the data into the DB, But I am looking for something like this:
const stream = fs.createReadStream(workerData)
.pipe(parse())
.on('data', async function(csvrow, maxLineToRead: 500) {
// whole code/logic of insert data into DB
})
so maxLineToRead is my imaginary term.
basically my point is I want to process 500 data at a time and insert it into the DB and want to repeat this process till the end.
You can create a higher scoped array variable where you accumulate rows of data as they arrive on the data event. When you get to 500 rows, fire off your database operation to insert them. If not yet at 500 rows, then just add the next one to the array and wait for more data events to come.
Then, in the end event insert any remaining rows still in the higher scoped array.
In this way, you will insert 500 at a time and then however many are left at the end. This has an advantage vs. inserting them all at the end that you spread out the database load over the time you are parsing.
Here's an attempt to implement that type of processing. There are some unknowns (documented with comments) based on an incomplete description of exactly what you're trying to accomplish in some circumstances):
const test = () => {
return new Promise((resolve, reject) => {
const accumulatedRows = [];
async function processRows(rows) {
// initialize data arrays that we will insert
const authorData = [],
companyData = [],
userData = [],
bookData = [],
relationalData = [];
// this code still has a problem that I don't have enough context
// to know how to solve
// If authorName contains csvrow.author, then the variable
// authorId is not initialized, but is used later in the code
// This is a problem that needs to be fixed.
// The same issue occurs for companyID
for (let csvrow of rows) {
let authorId, companyID;
if (!authorName.includes(csvrow.author)) {
const author = new Author({ author: csvrow.author })
authorId = author._id
authorName.push(author.author)
authorData.push(author)
}
if (!companyName.includes(csvrow.company_name)) {
const company = new Company({ companyName: csvrow.company_name })
companyID = company._id
companyName.push(company.companyName)
companyData.push(company)
}
let users = new User({
name: csvrow.firstname,
dob: csvrow.dob,
address: csvrow.address,
phone: csvrow.phone,
state: csvrow.state,
zip: csvrow.zip,
email: csvrow.email,
gender: csvrow.gender,
userType: csvrow.userType
});
userData.push(users)
let book = new Book({
book_number: csvrow.book_number,
book_name: csvrow.book_name,
book_desc: csvrow.book_desc,
user_id: users._id,
author_id: authorId
});
bookData.push(book)
relationalData.push({
username: users.name,
author_id: authorId,
book_id: book._id,
company_id: companyID
});
}
// all local arrays of data are populated now for this batch
// so add this data to the database
await Author.insertMany(authorData);
await User.insertMany(userData);
await Book.insertMany(bookData);
await Company.insertMany(companyData);
await Relational.insertMany(relationalData);
}
const batchSize = 50;
const stream = fs.createReadStream(workerData)
.pipe(parse())
.on('data', async function(csvrow) {
try {
accumulatedRows.push(csvRow);
if (accumulatedRows.length >= batchSize) {
stream.pause();
await processRows(accumulatedRows);
// clear out the rows we just processed
acculatedRows.length = 0;
stream.resume();
}
} catch (e) {
// calling destroy(e) will prevent leaking a stream
// and will trigger the error event to be called with that error
stream.destroy(e);
}
}).on('end', async function() {
try {
await processRows(accumulatedRows);
resolve();
} catch (e) {
reject(e);
}
}).on('error', (e) => {
reject(e);
});
});
}
test().then(() => {
parentPort.postMessage("true");
}).catch(err => {
console.log(err);
parentPort.postMessage("false");
});
I'm using MongoDB in node.js. I'm trying to update or insert many documents based on different conditions; however, MongoDB update (with upsert) only works with a single document or many documents with the same condition. Currently, I have an array containing the objects that I want to insert (Or update if the unique index exists) and I'm looping through the array and calling the updateOnce; however, I believe this method is not very efficient for a large number of objects that I'm going to have.
What is a better way to achieve this?
var mongoUtil = require( './database.js' );
var db = mongoUtil.getDb();
//array of objects to insert:
//NOTE: First + Last name is a unique index
var users = [
{firstName:"John", lastName:"Doe", points:300},
{firstName:"Mickey", lastName:"Mouse", points:200}
];
var collection = db.collection( 'points' );
for(var i = 0; i < users.length; i++) {
//If firstName and lastName already exists, update points. Otherwise insert new object
collection.updateOne(
{firstName: users[i].firstName, lastName: users[i].lastName},
{$set: {points: users[i].points}},
{upsert: true},
function(err,docs) {
if(err)
console.log(err);
}
)
}
I solved this issue using .bulkWrite():
var mongoUtil = require( './database.js' );
var db = mongoUtil.getDb();
var collection = db.collection( 'points' );
var users = [
{firstName:"John", lastName:"Doe", points:300},
{firstName:"Mickey", lastName:"Mouse", points:200}
];
let userUpdate = users.map(user => ({
updateOne: {
filter: {firstName: user.firstName, lastName: user.lastName},
update: {$set: user},
upsert: true
}
}));
collection.bulkWrite(userUpdate).catch(e => {
console.log(e);
});
There is a code:
const { MongoClient } = require('mongodb')
const db = MongoClient.connect('mongodb://172.17.0.2:27017/test')
db
.then(
async dataBase => {
eduDb = dataBase.db('edu-service-accounts')
const accounts = eduDb.collection('accounts')
await accounts.createIndex({ email: 1 }, { unique: true })
accounts.insertOne({ email: '123' })
}
)
Code above creates an index, but that is no unique. I already read official docs for native mongoDB driver, but can't handle it.
And yes, I've deleted all old indexex before testing that code.
Can someone please show a code that really create an index with unique.
I mean not part of official doc, or something like that - I need code that works.
NOTE: I tested that code with local db and mlab - the same result.
Like the documentation says: db.createIndex(collectionname, index[, options], callback) the creation returns an index. Try to log the result of the callback. Maybe you are getting an error from the db.
Try something like:
// your connection stuff
accounts.createIndex({ email: 1 }, { unique: true }, function(err, result) {
if(err) {
console.log(err);
} else {
console.log(result);
}
});
After that please provide us the logs.
I've got a problem with using mongoose and custom grunt tasks together. All i want to do is make a task that behaves like a simple put request, by taking the parameters I give the task in the command line and processing/saving them to the database. However, when I expect to find it in the DB after adding it.. I can't find it anywhere.
The goal is to create a new company and save 4 simple parameters save from a "grunt addcompany:a:b:c:d" command.
Here is the "company" model (I've kept it very basic):
var mongoose = require('mongoose'),
Schema = mongoose.Schema;
var Company = new Schema({
name: String,
email: String,
info: String,
also: String
});
module.exports = mongoose.model('Companies', Company);
This is at the top of my Gruntfile.js:
var schemaCompany = require('./models/company'),
mongoose = require('mongoose'),
db = mongoose.connection,
Company = mongoose.model('Companies', schemaCompany);
This is the task:
grunt.registerTask('addcompany', 'add a company', function(n,e,i,a) {
var done = this.async();
mongoose.connect('mongodb://localhost/app-test');
db.on('open', function () {
var co = new Company({
name: n,
email: e,
info: i,
also: a
});
co.save(function (err) {
if (err) return handleError(err);
console.log('success!');
});
console.log(co);
db.close();
});
});
When I type this in the CLI:
grunt addcompany:name:email:description:more_stuff
The CLI returns with:
Running "addcompany:name:email:description:more_stuff" (addcompany) task
{
"name": "name",
"email": "email",
"info": "description",
"also": "more_stuff",
"_id" : ObjectID(" ~object id here~ "),
"__v" : 0
}
Although it creates an Object ID, it never saves anywhere. Nothing is showing up in the companies collection in the app-test db. What am I missing?
Thank you for your help!
Try this:
db.on('open', function () {
var co = new Company({
name: n,
email: e,
info: i,
also: a
});
co.save(function (err) {
// log the doc
console.log(co);
// log the error
console.log(err);
if (err) return handleError(err);
console.log('success!');
// close the database after 'co' is saved.
db.close();
done(true);
});
});
I really really need help in this. I am using node.js with mongodb and mongoose. So far I've managed to create a schema and save those into my database.
var Bericht = new Schema({
name : String
, mail : String
, betreff : String
, inhalt : String
, datum : Date
});
var Bericht = mongoose.model('Bericht', Bericht);
I habe a html formular where I can transmit with misc. fields data, by querystring I converting those into readable strings
var bericht_data = {
name: tempo.Name
, mail: tempo.Mail
, betreff: tempo.Betreff
, inhalt: tempo.Inhalt
};
var testoro = new Bericht(bericht_data);
testoro.save(function (err) {
if (!err) console.log('Success!');
});
so tempo.Name for example is a string and it also successful in saving it.
So far I can save all data from this formular into my mongodb.
Now the very problem: I want the data back as string to handle for dynamic html.
To get the info into my console, I use
Bericht.find(
{},
{ '_id': 0},
function(err, docs) {
if (!err){
console.log(docs);
// process.exit();
}
else { throw err;}
}
);
The console gives me all data which was ever saved in my schema Bericht excluding the long _id stuff. Sample output:
[ { name: 'Hans', mail: 'hans#wurst.de', betreff: 'I lost my wurst', inhalt: 'look at me, I am amazing' } ]
That's just one, normally there would be a huge amount of data.
The idea is right now to extract only the name into a string like "Hans". I want to get this name into a var, but hell it seems impossible!
I've tried
Bericht.find(
{},
{ '_id': 0},
function(err, docs) {
if (!err){
console.log(docs.name);
// process.exit();
}
else { throw err;}
}
);
But the I get only "undefined" delivered. I appreciate your help!
Take a look at Mongoose QueryStreams. I haven't used it myself, but I've modified their example code to fit your Model, to give you an idea of how it might work in practice:
var stream = Bericht.find().stream()
, names = []
, i;
function closeHandler() {
console.log(JSON.stringify(names));
};
stream.on('data', function (doc) {
if (doc.name) {
names.push(doc.name);
}
})
stream.on('error', function (err) {
// handle err
})
stream.on('close', closeHandler)
Mongoose find return an array of documents so you should try the following:
Bericht.find(
{},
{ '_id': 0},
function(err, docs) {
if (!err){
for(var i=0; i<docs.length; i++)
{
console.log(docs[i].name);
}
// process.exit();
}
else { throw err;}
}
);