This is a common process for me in my previous works, so i usually have a very complex use case take for example
async function doThis(){
for (100x) {
try {
insertToDatabase()
await selectAndManipulateData()
createEmailWorker()
/** and many more **/
} catch {
logToAFile()
}
}
}
The code works, but its complicated 1 function doing all the things, the only reason i do this is because i can verify in real time if one function fails i can make sure the other function wont run so there wont be any incorrect data.
What i want to know is, what is the best architecture in defining a project structure that is not sacrificing the data integrity? (or is it already good enough?)
const doThis = async() => {
try {
for (100x) {
await insertToDatabase();
await selectAndManipulateData();
await createEmailWorker();
/** and many more **/
}
}
catch {
await logToAFile();
}
}
The best way of doing this is, you should always use await to call any function and make sure to with es6 syntax's as it gives a lot more feature. Your function should always be an async.
Always put your loop in try catch as it will give you any error in catch and it will calling function specific.
Actually, I would separate the persistence, manipulation and email jobs. Consider storing your data is a single responsibility. In addition to this, your modification and email workers should work as scheduled jobs. Once the jobs triggered, they should check if there is data related to its responsibility.
Another way is changing these scheduled jobs with triggered jobs. You can build a chain of responsibility that triggers next jobs and they would decide to work or not.
Related
The lambda's job is to see if a query returns any results and alert subscribers via an SNS topic. If no rows are return, all good, no action needed. This has to be done every 10 minutes.
For some reasons, I was told that we can't have any triggers added on the database, and no on prem environment is suitable to host a cron job
Here comes lambda.
This is what I have in the handler, inside a loop for each database.
sequelize.authenticate()
.then(() => {
for (let j = 0; j < database[i].rawQueries[j].length; j++) {
sequelize.query(database[i].rawQueries[j] => {
if (results[0].length > 0) {
let message = "Temporary message for testing purposes" // + query results
publishSns("Auto Query Alert", message)
}
}).catch(err => {
publishSns("Auto Query SQL Error", `The following query could not be executed: ${database[i].rawQueries[j])}\n${err}`)
})
}
})
.catch(err => {
publishSns("Auto Query DB Connection Error", `The following database could not be accessed: ${databases[i].database}\n${err}`)
})
.then(() => sequelize.close())
// sns publisher
function publishSns(subject, message) {
const params = {
Message: message,
Subject: subject,
TopicArn: process.env.SNStopic
}
SNS.publish(params).promise()
}
I have 3 separate database configurations, and for those few SELECT queries, I thought I could just loop through the connection instances inside a single lambda.
The process is asynchronous and it takes 9 to 12 seconds per invocation, which I assume is far far from optimal
The whole thing feels very very sub optimal but that's my current level :)
To make things worse, I now read that lambda and sequelize don't really play well together:
I am using sequelize because that's the only way I could get 3 connections to the database in the same invocation to work without issues. I tried mssql and tedious packages and wasn't able with either of them
It now feels like using an ORM is an overkill for this very simple task of a SELECT query, and I would really like to at least have the connections and their queries done asynchronously to save some execution time
I am looking into different ways to accomplish this and i went down the rabbit hole and I now have more questions than before! Generators? are they still useful? Observables with RxJs? Could this apply here? Async/Await or just Promises? Do I even need sequelize?
Any guidance/opinion/criticism would be very appreciated
I'm not familiar with sequelize.js but hope I can help. I don't know your level with RxJS and Observables but it's worth to try.
I think you could definitely use Observables and RxJS.
I would start with an interval() that will run the code every time you define.
You can then pipe the interval since it's an Observable, do the auth bit and do a map() to get an array of Observables (for each .query call, I am assuming all your calls, authenticate and query, are Promises so it's possible to transform them into Observables with from()). You can then use something like forkJoin() with the previous array to get a response after all calls are done.
In the .subscribe at the end, you would make the publishSns().
You can pipe a catchError() too and process errors.
The map() part might be not necessary and do it previously and have it stored in a variable since you don't depend on an authenticate value.
I'm certain my solution isn't the only one or the best but i think it would work.
Hope it helps and let me know if it works!
I have a Firebase function to decrease the commentCount when a comment is deleted, like this
export const onArticleCommentDeleted = functions.firestore.document('articles/{articleId}/comments/{uid}').onDelete((snapshot, context) => {
return db.collection('articles').doc(context.params.articleId).update({
commentCount: admin.firestore.FieldValue.increment(-1)
})
})
I also have firebase functions to recursively delete comments of an article when it's deleted
export const onArticleDeleted = functions.firestore.document('articles/{id}').onDelete((snapshot, context) => {
const commentsRef = db.collection('articles').doc(snapshot.id).collection('comments');
db.recursiveDelete(commentsRef); // this triggers the onArticleCommentDeleted multiple times
})
When I delete an article, the onArticleCommentDeleted is triggered and it tries to update the article that has already been deleted. Of course I can check if the article exists before updating it. But it's really cumbersome and waste of resources.
Are there any ways to avoid from propagating further triggers?
I think the problem arises from the way I make use of the trigger. In general, it's not a good idea to implement an onDelete trigger on a child document updates its own parent. This will surely cause conflict.
Instead, at the client side, I use transaction
...
runTransaction(async trans => {
trans.delete(commentRef);
trans.update(articleRef, {
commentCount: admin.firestore.FieldValue.increment(-1)
})
})
This makes sure that if one of the operation fails they both fail, and eradicates the triggers. Relying to the client side is not the best idea, but I think we can consider the trade off.
There is no way to prevent triggering the Cloud Function on the comments when you delete the comments for that article. You will have to check for that condition in the function code itself, as you already said.
I am using promise in bluebird to send mysql queries and manage the control flow & errors. Here is what it looks like:
sendQuery1(..)
.then(sendQuery2(..))
.then(function(results from last query){
if(rain){
res.render(...)
}else{
/*
I need to send additional 2 queries here
*/
}
}).catch(errors);
promise chain is very convenient, but I found out the error handling will be a messy if there are multiple subchains inside.
For here I probably need to write the following inside the /* */
return sendQuery3(..)
.then(sendQuery4(..))
.then(function(..){
res.render(".....")
}).catch(error2);
Are there any better ways to handle this type of problems?
I don't exactly see your problem here. When you have a chain inside a chain (which you normally don't, you should check your architecture right there), you can just catch the errors like shown.
I'd advise you use a global (or local) error handler function, and pass that to the catch-function. Therefore, even when you have multiple catches, you can use the same error-handler.
The in my opinion best solution would be to create a "promise chain bypass", therefore using catch to skip certain parts based on your condition. If this is not what you are looking for, please specify your problem.
You will catch the error, there's no problem in your code. Yup it gets messy but you can go with the following approach to keep the code clean and still implement your logic:
sendQuery1(..)
.then(sendQuery2(..))
.then(function(results from last query){
if(rain){
res.render(...)
throw new Error('breakChain'); //intentionally throwing error to skip the remaining chain
}
return; //will act like 'else'
})
.then(sendQuery3(..))
.then(sendQuery4(..))
.catch(function (e) {
if(e.message != 'breakChain') //act on error if it was other than 'breakChain'
throw e;
});
What you are discussing here is branching of the chain based on some logic condition. It is generally better to actually branch the chain rather than throw an error just to abort the rest of the chain. This keeps errors as errors rather than the scheme that Shaharyar proposed of synthesizing an error that isn't really an error.
You can branch the chain by returning a new promise chain from within a .then() handler like this:
sendQuery1(..).then(function(r1) {
return sendQuery2(...);
}).then(function(r2){
if (rain){
// processing is done, so just render
res.render(...)
} else {
// return promise here to attach this new branch to the original chain
return sendQuery3(..).then(sendQuery4).then(function() {
// process last query
});
}
}).catch(errors);
FYI, since you've only posted pseudo-code, not real code and we can't see which functions need access to which prior results in order to do their work, we can't fully optimize the code. This is to show an example of how branching works inside a promise chain. That's the principle you probably want to use. If you show your real code, we can offer a much more specific and optimized answer.
I'm still learning the node.js ropes and am just trying to get my head around what I should be deferring, and what I should just be executing.
I know there are other questions relating to this subject generally, but I'm afraid without a more relatable example I'm struggling to 'get it'.
My general understanding is that if the code being executed is non-trivial, then it's probably a good idea to async it, as to avoid it holding up someone else's session. There's clearly more to it than that, and callbacks get mentioned a lot, and I'm not 100% on why you wouldn't just synch everything. I've got some ways to go.
So here's some basic code I've put together in an express.js app:
app.get('/directory', function(req, res) {
process.nextTick(function() {
Item.
find().
sort( 'date-modified' ).
exec( function ( err, items ){
if ( err ) {
return next( err );
}
res.render('directory.ejs', {
items : items
});
});
});
});
Am I right to be using process.nextTick() here? My reasoning is that as it's a database call then some actual work is having to be done, and it's the kind of thing that could slow down active sessions. Or is that wrong?
Secondly, I have a feeling that if I'm deferring the database query then it should be in a callback, and I should have the actual page rendering happening synchronously, on condition of receiving the callback response. I'm only assuming this because it seems like a more common format from some of the examples I've seen - if it's a correct assumption can anyone explain why that's the case?
Thanks!
You are using it wrong in this case, because .exec() is already asynchronous (You can tell by the fact that is accepts a callback as a parameter).
To be fair, most of what needs to be asynchronous in nodejs already is.
As for page rendering, if you require the results from the database to render the page, and those arrive asynchronously, you can't really render the page synchronously.
Generally speaking it's best practice to make everything you can asynchronous rather than relying on synchronous functions ... in most cases that would be something like readFile vs. readFileSync. In your example, you're not doing anything synchronously with i/o. The only synchronous code you have is the logic of your program (which requires CPU and thus has to be synchronous in node) but these are tiny little things by comparison.
I'm not sure what Item is, but if I had to guess what .find().sort() does is build a query string internally to the system. It does not actually run the query (talk to the DB) until .exec is called. .exec takes a callback, so it will communicate with the DB asynchronously. When that communication is done, the callback is called.
Using process.nextTick does nothing in this case. That would just delay the calling of its code until the next event loop which there is no need to do. It has no effect on synchronicity or not.
I don't really understand your second question, but if the rendering of the page depends on the result of the query, you have to defer rendering of the page until the query completes -- you are doing this by rendering in the callback. The rendering itself res.render may not be entirely synchronous either. It depends on the internal mechanism of the library that defines the render function.
In your example, next is not defined. Instead your code should probably look like:
app.get('/directory', function(req, res) {
Item.
find().
sort( 'date-modified' ).
exec(function (err, items) {
if (err) {
console.error(err);
res.status(500).end("Database error");
}
else {
res.render('directory.ejs', {
items : items
});
}
});
});
});
I am wondering if node.js makes any guarantee on the order async calls start/complete.
I do not think it does, but I have read a number of code samples on the Internet that I thought would be buggy because the async calls may not complete in the order expected, but the examples are often stated in contexts of how great node is because of its single-threaded asynchronous model. However I cannot find an direct answer to this general question.
Is it a situation that different node modules make different guarantees? For example at https://stackoverflow.com/a/8018371/1072626 the answer clearly states the asynchronous calls involving Redis preserves order.
The crux of this problem can be boiled down to is the following execution (or similar) is strictly safe in node?
var fs = require("fs");
fs.unlink("/tmp/test.png");
fs.rename("/tmp/image1.png", "/tmp/test.png");
According to the author the call to unlink is needed because rename will fail on Windows if there is a pre-existing file. However, both calls are asynchronous, so my initial thoughts were that the call to rename should be in the callback of unlink to ensure the asynchronous I/O completes before the asynchronous rename operation starts otherwise rename may execute first, causing an error.
Async operation do not have any determined time to execute.
When you call unlink, it asks OS to remove the file, but it is not defined when OS will actually remove the file; it might be a millisecond or an year later.
The whole point of async operation is that they don't depend on each other unless explicitly stated so.
In order to rename to occur after unlink, you have to modify your code like this:
fs.unlink("/tmp/test.png", function (err) {
if (err) {
console.log("An error occured");
} else {
fs.rename("/tmp/image1.png", "/tmp/test.png", function (err) {
if (err) {
console.log("An error occured");
} else {
console.log("Done renaming");
}
});
}
});
or, alternatively, to use synchronized versions of fs functions (note that these will block the executing thread):
fs.unlinkSync("/tmp/test.png");
fs.renameSync("/tmp/image1.png", "/tmp/test.png");
There also libraries such as async that make async code to look better:
async.waterfall([
fs.unlink.bind(null, "/tmp/test.png");
fs.rename.bind(null, "/tmp/image1.png", "/tmp/test.png");
], function (err) {
if (err) {
console.log("An error occured");
} else {
console.log("done renaming");
}
});
Note that in all examples error handling is extremely simplified to represent the idea.
If you look at the documentation of Node.js you'll find that the function fs.unlink takes a callback as an argument as:
fs.unlink(path, [callback]);
An action that you intend to take when the current function returns should be passed to the function as the callback argument. So typically in your case the code will take the following form:
var fs = require("fs");
fs.unlink("/tmp/test.png", function(){
fs.rename("/tmp/image1.png", "/tmp/test.png");
});
In the specific case of unlink and rename there are also synchronous function in Node.js and can be used as fs.unlinkSync(path) and fs.renameSync(oldPath, newPath). This will ensure that the code is run synchronously.
Moreover if you wish to use asynchronous implementation but retain better readability you could consider a library like async. It also has options for different modes of implementation like parallel, series, waterfall etc.
Hope this helps.