Bluebird promise concurrency in mongo db collection migrate operations - node.js

I am working on a script where I need to take the data from a mongodb collection, process it with some renames and mapping and put the data to other collection. I am using this script in expressjs with mongodb-native client for nodejs.
Here is my script with all the running functions
const syncCustomerWithCustomerv1 = function(params){
utils.connectDB().then((client) => {
Promise.map(aggregateDocumentsv1(client, params), function(cursor){
Promise.map(getCustomerDatav1(cursor), function(customer){
var hashedMap = makeHashedObjectv1(customer);
makeDatav1(client, customer, hashedMap).then((response)=> {
console.log("success");
}).catch((error) => {
console.log(error);
reject(error);
})
}, {concurrency: 500});
}, {concurrency: 500}).then((reponse) => {
console.log("data inserted");
})
}).catch((error) => {
console.log(error);
});
}
Now in function named syncCustomerWithCustomerv1, I am able to fetch the data from old collection to the new collection, but I do not think any concurrent requests are taken into this. while the above operation is running, I am not able to hit api, so when the operation is running it does not allow other requests to run along.
In the Promise.map(aggregateDocumentsv1), I am taking the list of cursors. each element in listOfCursor array has a cursor which when queried yields 500 records.
I am expecting it gets each cursor and assigns it to the next Promise.map(getCustomerDatav1(cursor)), Now this yields each customer I have in my previous mongo collection, and we can perform the mapping operations on the object we got and then insert the data into the new collection.
If anyone gets the issue and know how I can make it better concurrent so that I can run this script and also my APIs also don't get any downtime.

I don't know about MongoDB, but there are a few problems with your promise code:
For any function (be it Promise.map or then) to be able to wait for the result of an asynchronous callback, that callback must return a promise to be awaited
You are doing 500 concurrent operations, where each of those operations does 500 concurrent operations. That's a total concurrency factor of 250000! You probably want to reduce that a bit.
function syncCustomerWithCustomerv1(params){
utils.connectDB().then(client => {
return Promise.map(aggregateDocumentsv1(client, params), cursor => {
// ^^^^^^
return Promise.map(getCustomerDatav1(cursor), customer => {
// ^^^^^^
var hashedMap = makeHashedObjectv1(customer);
return makeDatav1(client, customer, hashedMap).then(response => {
// ^^^^^^
console.log("success");
}, error => {
console.log(error);
});
}, {concurrency: 500});
}, {concurrency: 500})
}).then(reponse => {
console.log("data inserted");
}, error => {
console.log(error);
});
}

Related

Asynchronous CRUD operations with express

I have a basic CRUD application using html forms, nodejs/express and mongodb. I have been learning about synchronous vs asynchronous code via callbacks, promises, and async/await and to my understanding for a crud application you would want the operations to be asynchronous so multiple users can do the operations at the same time. I am trying to implement aync/await with my express crud operations and am not sure if they are executing synchronously or asynchronously.
Here is my update function, which allows a user to type in the _id of the blog they want to change, then type in a new title and new body for the blog and submit it. In its current state, to my knowledge it is executing synchronously:
app.post('/update', (req, res) => {
const oldValue = { _id: new mongodb.ObjectId(String(req.body.previousValue)) }
const newValues = { $set: { blogTitle: req.body.newValue, blogBody: req.body.newValue2 } }
db.collection("miscData").updateOne(oldValue, newValues, function (err, result) {
if (err) throw err;
console.log("1 document updated");
res.redirect('/')
});
})
The way in which I was going to change this to asynchronous was this way:
app.post('/update', async (req, res) => {
const oldValue = { _id: new mongodb.ObjectId(String(req.body.previousValue)) }
const newValues = { $set: { blogTitle: req.body.newValue, blogBody: req.body.newValue2 } }
await db.collection("miscData").updateOne(oldValue, newValues, function (err, result) {
if (err) throw err;
console.log("1 document updated");
res.redirect('/')
});
})
Both blocks of code work, however I am not sure if the second block of code is doing what I am intending it to do, which is allow a user to update a blog without blocking the call stack, or if the second block of code would only make sense if I was running more functions after the await. Does this achieve the intended purpose, if not how could/should I do that?
db.collection(...).updateOne is always asynchronous, so you need not worry that a long-running database operation might block your application. There are two ways how you can obtain the asynchronous result:
With a callback function
db.collection(...).updateOne(oldValues, newValues, function(err, result) {...});
console.log("This happens synchronously");
The callback function with the two parameters (err, result) will be called asynchronously, after the database operation has completed (and after the console.log). Either err contains a database error message or result contains the database result.
With promises
try {
var result = await db.collection(...).updateOne(oldValues, newValues);
// Do something with result
} catch(err) {
// Do something with err
}
console.log("This happens asynchronously");
The updateOne function without a callback function as third parameter returns a promise that must be awaited. The statements that do something with result will be executed asynchronously, after the database operation has successfully completed. If a database error occurs, the statements in the catch block are executed instead. In either case (success or error), the console.log is only executed afterwards.
(If updateOne does not have a two-parameter version, you can write
var result = await util.promisify(db.collection(...).updateOne)(oldValues, newValues);
using util.promisify.)
Your second code snippet contains a mixture of both ways (third parameter plus await), which does not make sense.

What is/how to write multiple tx.run as a single transaction with the neo4j-driver in NODEjs?

I'm trying to sync two databases and for each type of operation I want it to be a whole transaction, I've setup something rather basic but I'm unsure if this will actually run and rollback as a single transaction. I've been looking through the documentation but the one for NodeJS is rather scarce.
My example code:
const tx = session.beginTransaction();
SyncElements(doc.lists.elements, existingElements, tx);
tx.commit()
.then(res => {
console.log('SYNC ELEMENTS SUCCESSFUL, RESULT:', res);
})
.catch(err => {
console.error('COULD NOT SYNC ELEMENTS TRANSACTION FAILURE: ', err);
});
Now within that SyncElements() function which gets the tx object, it will call tx.run( dozens of times.
My question is, those dozens of tx.run calls, will they all be considered part of the main transaction that I'm commiting above? and if any of those calls fails, will all the changes made using that tx. object be reversed?
EDIT: my second attempt:
Each of my runs returns the promise as such:
deleteElements.forEach(el => {
allOperations.push(deleteElement(el, tx));
});
and then
const deleteElement = (
el: Element,
tx: neo4j.default.Transaction,
): Promise<neo4j.default.StatementResult> => {
return tx.run(`MATCH... STATEMENT`);
};
I'm storing them in an array and then running:
try {
await Promise.all(allOperations);
await tx.commit();
return;
} catch (err) {
await tx.rollback();
console.error('COULD NOT SYNC ELEMENTS TRANSACTION FAILURE: ', err);
}
I think this is more on par with the documentation? I guess in my first attempt I didn't really have any way to rollback the transaction so just assumbed it will not commit if there's an error?

NodeJS: Handling transactions with NoSQL databases?

Consider a promise-chained chunk of code for example:
return Promise.resolve()
.then(function () {
return createSomeData(...);
})
.then(function () {
return updateSomeData(...);
})
.then(function () {
return deleteSomeData(...);
})
.catch(function (error) {
return ohFishPerformRollbacks();
})
.then(function () {
return Promise.reject('something failed somewhere');
})
In the above code, let's say something went wrong in the function updateSomeData(...). Then one would have to revert the create operation that was executed before this.
In another case, if something went wrong in the function deleteSomeData(...), then one would want to revert the operations executed in createSomeData(...) and updateSomeData(...).
This would continue as long as all the blocks have some revert operations defined for themselves in case anything goes wrong.
Only if there was a way in either NodeJs or the database or somewhere in the middle, that would revert all the transactions happening under the same block of code.
One way I can think of this to happen is by flagging all the rows in database with a transactionId (ObjectID) and a wasTransactionSuccessful(boolean), so that CRUD operations could be clubbed together with their transactionIds, and in case something goes wrong, those transactions could be simply deleted from the database in the ending catch block.
I read about rolling back transactions in https://docs.mongodb.com/manual/tutorial/perform-two-phase-commits/. But I want to see if it can be done in a more simpler fashion and in a generic manner for NoSQL databases to adapt.
I am not sure if this would satisfy your use case, but I hope it would.
let indexArray = [1, 2, 3];
let promiseArray = [];
let sampleFunction = (index) => {
return new Promise((resolve, reject) => {
setTimeout(resolve, 100, index);
});
}
indexArray.map((element) => {
promiseArray.push(sampleFunction(element));
});
Promise.all(promiseArray).then((data) => {
// do whatever you want with the results
}).catch((err) => {
//Perform your entire rollback here
});
async.waterfall([
firstFunc,
secondFunc
], function (err, result) {
if (err) {
// delete the entire thing
}
});
Using the async library would give you a much elegant solution than going with chaining.

How to handle chained promises in a loop in nodejs with bluebird

The gist of the problem is:
for (let i=0;i<list.length;i++) {
AsyncCall_1({'someProperty': list[i] })
.then((resp_1) => {
resp_1.doSomething();
resp_1.AsyncCall_2()
.then((resp_2) => {
resp_2.doSomething();
})
})
}
after last resp.AsyncCall_2.then(()=> {
//do something
})
I need to sequentially chain all the promises so that, the loop waits for the "resp.AsyncCall_2" function to be resolved for its next iteration. After last "resp.AsyncCall_2" call do something. (since all the promises will be resolved the)
Actual Problem:
for (var i=0;i<todo.assignTo.length;i++) {
Users.findOne({'username': todo.assignTo[i] })
.then((user) => {
user.assigned.push(todo.title);
user.notificationCount.assignedTodosCount++;
user.save()
.then((user) => {
console.log("todo is assigned to the user: " + user.username)
})
})
}
//to something at last call resloved (I know this is wrong way of doing this)
Users.find({})
.then((users)=> {
var promises = [];
for (var i=0;i<users.length;i++) {
users[i].notificationCount.totalTodosCount++;
promises.push(users[i].save());
}
Promise.all(promises)
.then(()=> {
res.statusCode = 200;
res.setHeader('Content-Type', 'application/json');
console.log("todo is successfully posted");
res.json({success : true, todo});
},(err) => next(err))
.catch((err) => next(err));
})
Thank You in Advance..
In modern versions of node.js, you can just use async/await and don't need to use Bluebird iteration functions:
async function someMiddlewareFunc(req, res, next) {
try {
for (let item of list) {
let resp_1 = await AsyncCall_1({'someProperty': item });
resp_1.doSomething();
let resp_2 = await resp_1.AsyncCall_2();
resp_2.doSomething();
}
// then do something here after the last iteration
// of the loop and its async operations are done
res.json(...);
} catch(err) {
next(err);
}
}
This will serialize the operations (which is what you asked for) so the 2nd iteration of the loop doesn't start until the async operations in the first iteration is done.
But, it doesn't appear in your real code that you actually need to serialize the individual operations and serializing things that don't have to be serialized usually makes the end-to-end time to complete them be longer. So, you could run all the items in your loop in parallel, collect all the results at the end and then send your response and Bluebird's Promise.map() would be quite useful for that because it combines a .map() and a Promise.all() into one function call:
function someMiddlewareFunc(req, res, next) {
Promise.map(list, (item) => {
return AsyncCall_1({'someProperty': item}).then(resp_1 => {
resp_1.doSomething();
return resp_1.AsyncCall_2();
}).then(resp_2 => {
return resp_2.doSomething();
});
}).then(results => {
// all done here
res.json(...);
}).catch(err => {
next(err);
});
}
FYI, when using res.json(...), you don't need to set these res.statusCode = 200; or res.setHeader('Content-Type', 'application/json'); as they will be done for you automatically.
Further notes about Bluebird's Promise.map(). It accepts a {concurrency: n} option that tells Bluebird how many operations are allowed to be "in flight" at the same time. By default, it runs them all in parallel at the same time, but you can pass any number you want as the concurrency option. If you pass 1, it will serialize things. Using this option can be particularly useful when parallel operation is permitted, but the array is very large and iterating all of them in parallel runs into either memory usage problems or overwhelms the target server. In that case, you can set the concurrency value to some intermediate value that still gives you some measure of parallel execution, but doesn't overwhelm the target (some number between 5 and 20 is often appropriate - it depends upon the target service). Sometimes, commercial services (like Google) also have limits about how many requests they will handle at the same time from the same IP address (to protect them from one account using too much of the service at once) and the concurrency value can be useful for that reason too.
Have you tried Promise.each?
const users = todo.assignTo.map(function(user) {
return Users.findOne({'username': assigned_to });
}
Promise.each(users, function(user) {
user.assigned.push(todo.title);
user.notificationCount.assignedTodosCount++;
user.save()
.then((user) => {
console.log("todo is assigned to the user: " + user.username)
})
}

sharing a transaction or task object between multiple queries across multiple controllers in pg-promise

I am relatively new to node.js, postgresql, promises and infact stackoverflow so apologies in advance if things sound a little disjointed!
I am currently trying to run multiple queries within chained promises spread across various controllers. I want to run all the queries within the same transaction or task to eliminate multiple connects and disconnects to the database.
I have tried the following where I am adding a student and assigning two mentors for that student. The HTTP request is routed to the student controller which adds a student via the student repository. The student repository is where the task starts and is returned to the controller which forwards it to the mentor controller and along the chain it goes...
# HttpPost("/api/students/create")
addStudent( # Req()request) {
var studentid;
var mentorids= [];
//Add the student
return this.studentRepository.addStudent(request.body.student)
.then(newStudentId => {
studentid = newStudentId;
//Add the mentors, passing the transaction object received back from
studentRepository
return this.mentorController.addMentor(request.body.schoolid, request.body.mentors, newStudentId.transaction)
.then(result => {
var data = [];
console.log(result);
for (var role in result) {
data.push({
mentorid: result[role].roleid,
studentid: studentid
});
}
//Assigns the student to mentors in the link table
return this.studentRepository.assignMentor(data)
.then(result => {
return result;
})
})
});
}
Student repository
addStudent(student): any {
return this.collection.task(t => {
return this.collection.one(this.sql.addStudent, student)
.then(studentid => {
return {
studentid: studentid.studentid,
transaction: t
}
});
})
}
Mentor controller
addMentor(institutionid: number, mentors, t): any {
var promises = [];
var mentorIds = [];
for (var role in mentors) {
promises.push(this.roleController.registerRole(institutionid,mentors[role].role,t));
}
return t.batch(promises)
.then(result => {
return Promise.resolve(result);
})
}
Role controller
# HttpPost("/api/roles/register")
registerRole(institutionid, # Req()request, t ? ) : any {
console.log(request);
return this.roleRepository.checkRoleEnrollment(institutionid, request.email, request.roletype, t)
.then(result => {
return this.roleRepository.addRoleEnrollment(institutionid, request, t)
.then(data => {
return this.roleRepository.updateRoleEnrollment(data.roleenrollmentid, data.roleid)
.then(d => {
return data;
})
})
})
.catch (error => {
return Promise.reject(error);
});
}
I am getting the following error when I call checkEnrollment in the Role Controller:
"name": "Error",
"message": "Unexpected call outside of task.",
"stack": "Error: Unexpected call outside of task. at Task.query
(\api\node_modules\pg-promise\lib\task.js:118:19)
at Task.obj.oneOrNone (\api\node_modules\pg-promise\lib\database.js:491:31)
at RoleRepository.checkRoleEnrollment....
Any help would be much appreciated. Thanking you in advance.
As per my earlier comment:
That error means you are trying to access connection t allocated by a task somewhere outside of the task's callback function, i.e. the task's callback has returned, the connection was released, and then you are using the connection object allocated by the task from somewhere else, which is, of course, invalid.
b.t.w. I'm the author of pg-promise ;)
Below is what your code effectively doing, in a simplified form:
var cnReference;
db.task(t => {
cnReference = t;
// can only use `t` connection while executing the callback
})
.then(data => {
// we are now outside of the task;
// the task's connection has been closed already,
// and we can do nothing with it anymore!
return cnReference.query('SELECT...');
})
.catch(error => {
// ERROR: Unexpected call outside of task.
// We cannot use a task connection object outside of the task's callback!
});
You need to correct the implementation to make sure this doesn't happen.

Resources