NodeJS Filesytem sync and performance - node.js

I've run into an issue with NodeJS where, due to some middleware, I need to directly return a value which requires knowing the last modified time of a file. Obviously the correct way would be to do
getFilename: function(filename, next) {
fs.stat(filename, function(err, stats) {
// Do error checking, etc...
next('', filename + '?' + new Date(stats.mtime).getTime());
});
}
however, due to the middleware I am using, getFilename must return a value, so I am doing:
getFilename: function(filename) {
stats = fs.statSync(filename);
return filename + '?' + new Date(stats.mtime).getTime());
}
I don't completely understand the nature of the NodeJS event loop, so what I was wondering is if statSync had any special sauce in it that somehow pumped the event loop (or whatever it is called in node, the stack of instructions waiting to be performed) while the filenode information was loading or is it really blocking and that this code is going to cause performance nightmares down the road and I should rewrite the middleware I am using to use a callback? If it does have special sauce to allow for the event loop to continue while it is waiting on the disk, is that available anywhere else (though some promise library or something)?

Nope, there is no magic here. If you block in the middle of the function, everything is blocked.
If performance becomes an issue, I think your only option is to rewrite that part of the middleware, or get creative with how it is used.

Related

How to check file is writable (resource is not busy nor locked)

excel4node's write to file function catches error and does not propagate to a caller. Therefore, my app cannot determine whether write to file is successful or not.
My current workaround is like below:
let fs = require('fs')
try {
let filePath = 'blahblah'
fs.writeFileSync(filePath, '') // Try-catch is for this statement
excel4nodeWorkbook.write(filePath)
} catch (e) {
console.log('File save is not successful')
}
It works, but I think it's a sort of hack and that it's not a semantically correct way. I also testedfs.access and fs.accessSync, but they only check permission, not the state (busy/lock) of resource.
Is there any suggestion for this to look and behave nicer without modifying excel4node source code?
I think you are asking the wrong question. If you check at time T, then write at time T + 1ms, what would guarantee that the file is still writeable?
If the file is not writeable for whatever reason, the write will fail, period. Nothing to do. Your code is fine, but you can probably also do without the fs.writeFileSync(), which will just erase whatever else was in the file before.
You can also write to a randomly-generated file path to make reasonably sure that two processes are not writing to the same file at the same time, but again, that will not prevent all possible write errors, so what you really, really want is rather some good error handling.
In order to handle errors properly you have to provide a callback!
Something along the lines of:
excel4nodeWorkbook.write(filePath, (err) => {
if (err) console.error(err);
});
Beware, this is asynchronous code, so you need to handle that as well!
You already marked a line in the library's source code. If you look a few lines above, you can see it uses the handler argument to pass any errors to. In fact, peeking at the documentation comment above the function, it says:
If callback is given, callback called with (err, fs.Stats) passed
Hence you can simply pass a function as your second argument and check for err like you've probably already seen elsewhere in the node environment:
excel4nodeWorkbook.write(filepath, (err) => {
if (err) {
console.error(err);
}
});

Is there any risk to read/write the same file content from different 'sessions' in Node JS?

I'm new in Node JS and i wonder if under mentioned snippets of code has multisession problem.
Consider I have Node JS server (express) and I listen on some POST request:
app.post('/sync/:method', onPostRequest);
var onPostRequest = function(req,res){
// parse request and fetch email list
var emails = [....]; // pseudocode
doJob(emails);
res.status(200).end('OK');
}
function doJob(_emails){
try {
emailsFromFile = fs.readFileSync(FILE_PATH, "utf8") || {};
if(_.isString(oldEmails)){
emailsFromFile = JSON.parse(emailsFromFile);
}
_emails.forEach(function(_email){
if( !emailsFromFile[_email] ){
emailsFromFile[_email] = 0;
}
else{
emailsFromFile[_email] += 1;
}
});
// write object back
fs.writeFileSync(FILE_PATH, JSON.stringify(emailsFromFile));
} catch (e) {
console.error(e);
};
}
So doJob method receives _emails list and I update (counter +1) these emails from object emailsFromFile loaded from file.
Consider I got 2 requests at the same time and it triggers doJob twice. I afraid that when one request loaded emailsFromFile from file, the second request might change file content.
Can anybody spread the light on this issue?
Because the code in the doJob() function is all synchronous, there is no risk of multiple requests causing a concurrency problem.
If you were using async IO in that function, then there would be possible concurrency issues.
To explain, Javascript in node.js is single threaded. So, there is only one thread of Javascript execution running at a time and that thread of execution runs until it returns back to the event loop. So, any sequence of entirely synchronous code like you have in doJob() will run to completion without interruption.
If, on the other hand, you use any asynchronous operations such as fs.readFile() instead of fs.readFileSync(), then that thread of execution will return back to the event loop at the point you call fs.readFileSync() and another request can be run while it is reading the file. If that were the case, then you could end up with two requests conflicting over the same file. In that case, you would have to implement some form of concurrency protection (some sort of flag or queue). This is the type of thing that databases offer lots of features for.
I have a node.js app running on a Raspberry Pi that uses lots of async file I/O and I can have conflicts with that code from multiple requests. I solved it by setting a flag anytime I'm writing to a specific file and any other requests that want to write to that file first check that flag and if it is set, those requests going into my own queue are then served when the prior request finishes its write operation. There are many other ways to solve that too. If this happens in a lot of places, then it's probably worth just getting a database that offers features for this type of write contention.

To async, or not to async in node.js?

I'm still learning the node.js ropes and am just trying to get my head around what I should be deferring, and what I should just be executing.
I know there are other questions relating to this subject generally, but I'm afraid without a more relatable example I'm struggling to 'get it'.
My general understanding is that if the code being executed is non-trivial, then it's probably a good idea to async it, as to avoid it holding up someone else's session. There's clearly more to it than that, and callbacks get mentioned a lot, and I'm not 100% on why you wouldn't just synch everything. I've got some ways to go.
So here's some basic code I've put together in an express.js app:
app.get('/directory', function(req, res) {
process.nextTick(function() {
Item.
find().
sort( 'date-modified' ).
exec( function ( err, items ){
if ( err ) {
return next( err );
}
res.render('directory.ejs', {
items : items
});
});
});
});
Am I right to be using process.nextTick() here? My reasoning is that as it's a database call then some actual work is having to be done, and it's the kind of thing that could slow down active sessions. Or is that wrong?
Secondly, I have a feeling that if I'm deferring the database query then it should be in a callback, and I should have the actual page rendering happening synchronously, on condition of receiving the callback response. I'm only assuming this because it seems like a more common format from some of the examples I've seen - if it's a correct assumption can anyone explain why that's the case?
Thanks!
You are using it wrong in this case, because .exec() is already asynchronous (You can tell by the fact that is accepts a callback as a parameter).
To be fair, most of what needs to be asynchronous in nodejs already is.
As for page rendering, if you require the results from the database to render the page, and those arrive asynchronously, you can't really render the page synchronously.
Generally speaking it's best practice to make everything you can asynchronous rather than relying on synchronous functions ... in most cases that would be something like readFile vs. readFileSync. In your example, you're not doing anything synchronously with i/o. The only synchronous code you have is the logic of your program (which requires CPU and thus has to be synchronous in node) but these are tiny little things by comparison.
I'm not sure what Item is, but if I had to guess what .find().sort() does is build a query string internally to the system. It does not actually run the query (talk to the DB) until .exec is called. .exec takes a callback, so it will communicate with the DB asynchronously. When that communication is done, the callback is called.
Using process.nextTick does nothing in this case. That would just delay the calling of its code until the next event loop which there is no need to do. It has no effect on synchronicity or not.
I don't really understand your second question, but if the rendering of the page depends on the result of the query, you have to defer rendering of the page until the query completes -- you are doing this by rendering in the callback. The rendering itself res.render may not be entirely synchronous either. It depends on the internal mechanism of the library that defines the render function.
In your example, next is not defined. Instead your code should probably look like:
app.get('/directory', function(req, res) {
Item.
find().
sort( 'date-modified' ).
exec(function (err, items) {
if (err) {
console.error(err);
res.status(500).end("Database error");
}
else {
res.render('directory.ejs', {
items : items
});
}
});
});
});

Should I avoid calling require when responding to a request?

Is requireing a module going to block every single request? According to the docs, the module is cached after the first require but I wanted to see if it's an anti-pattern to do a dynamic require when responding to a request.
Nope, it won't block on every request (as long as you're requiring the same module each time), and it's not an anti-pattern.
If you're loading the same module on each request, any call to require will return instantly (because the module will have already been loaded, compiled, and cached). If, however, many different modules may be required so that you don't get the benefit of caching, it may be better to do an asynchronous require.
But something like this?
function handler(req, res) { require('fs').readFile(…); }
No big deal. It's just a matter of style.
I am often told that blocking of any kind is a no-no in node.js, and that asynchronicity is one of its main imperatives. You could try the following.
Quoting the answer from non-blocking require in node.js
This is how require is implemented:
> console.log(require.extensions['.js'].toString())
function (module, filename) {
var content = NativeModule.require('fs').readFileSync(filename, 'utf8');
module._compile(stripBOM(content), filename);
}
You can do the same thing in your app. I guess something like this would work:
var fs = require('fs')
require.async = function(filename, callback) {
fs.readFile(filename, 'utf8', function(err, content) {
if (err) return callback(err)
module._compile(content, filename)
// this require call won't block anything because of caching
callback(null, require(filename))
})
}
require.async('./test.js', function(err, module) {
console.log(module)
})
It is not about slow or fast. require is synchronous operation. This means that it will block whole server while executing. If you have 100000 connections all will wait for 100000 requires.
Never use require inside loop, it is bad practice.
So answer to your original question is YES.

node.js middleware and js encapsulation

I'm new to javascript, and jumped right into node.js. I've read a lot of theory, and began well with the practical side (I'm writing an API for a mobile app), but I have one basic problem, which has lead me to middleware. I've successfully implemented a middleware function, but I would like to know if the use I'm giving the idea of middleware is OK, and also resolve the original problem which brought me to middleware. My question is two-fold, it's as follows:
1) From what I could gather, the idea of using middleware is repeating a process before actually processing the request. I've used it for token verification, as follows:
Only one of my urls doesn't receive a token parameter, so
app.js
app.get('/settings', auth.validateToken, auth.settings);
auth.js
function validateToken(req, res, next){ //code };
In validateToken, my code checks the token, then calls next() if everything is OK, or modifies res as json to return a specific error code.
My questions regarding this are: a) Is this a correct use of middleware? b) is there a [correct] way of passing a value onto the next function? Instead of calling next only if everything is OK, is there a [correct] way of calling next either way, and knowing from inside the next function (whichever it is), if the middleware was succesful or not? If there is, would this be a proper use of middleware? This precise point brings me to my original problem, and part two of this question, which is encapsulating functions:
THIS PART WAS FIXED, SEE MY SECOND COMMENT.
2) I discovered middleware trying to simply encapsulate validateToken, and be able to call it from inside the functions that the get handlers point to, for example auth.settings.
I'm used to common, sequential programming, and not in javascript, and haven't for the life of me been able to understand how to do this, taking into account the event-based nature of node.js.
What I want to do right now is write a function which simply verifies the user and password. I have it perfectly written inside a particular handler, but was about to copy-paste it to another one, so I stopped. I want to do things the right way from scratch, and understand node.js. One of the specific problems I've been having, is that the error code I have to return when user and password don't match are different depending on the parent function, so I would need this function to be able to tell the callback function "hey, the password and user don't match", so from the parent function I can respond with the correct message.
I think what I actually want is to write an asynchronous function I can call from inside another one.
I hope I've been clear, I've been trying to solve this on my own, but I can't quite finish wrapping my head around what my actual problem is, I'm guessing it's due to my recent introduction to node.js and JS.
Thanks in advance! Jennifer.
1) There is res.locals object (http://expressjs.com/api.html#res.locals) designed to store data local to the request and to pass them from one middleware to another. After request is processed this object is disposed of. If you want to store data within the session you can use req.session.
2) If I understand your question, you want a function asynchronously passing the response to the caller. You can do it in the same way most node's functions are designed.
You define a function in this way:
function doSomething(parameters, callback) {
// ... do something
// if (errorConddition()) err = errorCode();
if (callback) callback(err, result)
}
And the caller instead of using the return value of the function passes callback to this function:
function caller(req, res, next) {
//...
doSomething(params, function(err, result) {
if (! err && result) {
// do something with the result
next();
} else {
// do something else
next();
// or even res.redirect('/error');
}
});
}
If you find yourself writing similar callback functions you should define them as function and just pass the function as parameter:
//...
doSomething(param, processIt);
function processIt(err, result) {
// ...
}
What keeps you confused, probably, is that you don't treat functions as values yet, which is a very specific to JavaScript (not counting for languages that are little used).
In validateToken, my code checks the token, then calls next() if everything is OK, or modifies res as json to return a specific error code.
a) Is this a correct use of middleware?
b) is there a [correct] way of passing a value onto the next function?
Yes that is the correct way of using middleware, although depending on the response message type and specifications you could use the built in error handling of connect. That is in this example generate a 401 status code by calling next({status:401,stack:'Unauthorized'});
The middleware system is designed to handle the request by going through a series of functions until one function replies to the request. This is why the next function only takes one argument which is error
-> if an error object is passed to the next function then it will be used to create a response and no further middleware will be processed. The manner in which error response is created is as follows
// default to 500
if (res.statusCode < 400) res.statusCode = 500;
debug('default %s', res.statusCode);
// respect err.status
if (err.status) res.statusCode = err.status;
// production gets a basic error message
var msg = 'production' == env
? http.STATUS_CODES[res.statusCode]
: err.stack || err.toString();
-> to pass values down the middleware stack modifying the request object is the best method. This ensures that all processing is bound to that specific request and since the request object goes through every middleware function it is a good way to pass information down the stack.

Resources