Why does Node.js have both async & sync version of fs methods? - node.js

In Node.js, I can do almost any async operation one of two ways:
var file = fs.readFileSync('file.html')
or...
var file
fs.readFile('file.html', function (err, data) {
if (err) throw err
console.log(data)
})
Is the only benefit of the async one custom error handling? Or is there really a reason to have the file read operation non-blocking?

These exist mostly because node itself needs them to load your program's modules from disk when your program starts. More broadly, it is typical to do a bunch a synchronous setup IO when a service is initially started but prior to accepting network connections. Once the program is ready to go (has it's TLS cert loaded, config file has been read, etc), then a network socket is bound and at that point everything is async from then on.

Asynchronous calls allow for the branching of execution chains and the passing of results through that execution chain. This has many advantages.
For one, the program can execute two or more calls at the same time, and do work on the results as they complete, not necessarily in the order they were first called.
For example if you have a program waiting on two events:
var file1;
var file2;
//Let's say this takes 2 seconds
fs.readFile('bigfile1.jpg', function (err, data) {
if (err) throw err;
file1 = data;
console.log("FILE1 Done");
});
//let's say this takes 1 second.
fs.readFile('bigfile2.jpg', function (err, data) {
if (err) throw err;
file2 = data;
console.log("FILE2 Done");
});
console.log("DO SOMETHING ELSE");
In the case above, bigfile2.jpg will return first and something will be logged after only 1 second. So your output timeline might be something like:
#0:00: DO SOMETHING ELSE
#1:00: FILE2 Done
#2:00: FILE1 Done
Notice above that the log to "DO SOMETHING ELSE" was logged right away. And File2 executed first after only 1 second... and at 2 seconds File1 is done. Everything was done within a total of 2 seconds though the callBack order was unpredictable.
Whereas doing it synchronously it would look like:
file1 = fs.readFileSync('bigfile1.jpg');
console.log("FILE1 Done");
file2 = fs.readFileSync('bigfile2.jpg');
console.log("FILE2 Done");
console.log("DO SOMETHING ELSE");
And the output might look like:
#2:00: FILE1 Done
#3:00: FILE2 Done
#3:00 DO SOMETHING ELSE
Notice it takes a total of 3 seconds to execute, but the order is how you called it.
Doing it synchronously typically takes longer for everything to finish (especially for external processes like filesystem reads, writes or database requests) because you are waiting for one thing to complete before moving onto the next. Sometimes you want this, but usually you don't. It can be easier to program synchronously sometimes though, since you can do things reliably in a particular order (usually).
Executing filesystem methods asynchronously however, your application can continue executing other non-filesystem related tasks without waiting for the filesystem processes to complete. So in general you can continue executing other work while the system waits for asynchronous operations to complete. This is why you find database queries, filesystem and communication requests to generally be handled using asynchronous methods (usually). They basically allow other work to be done while waiting for other (I/O and off-system) operations to complete.
When you get into more advanced asynchronous method chaining you can do some really powerful things like creating scopes (using closures and the like) with a little amount of code and also create responders to certain event loops.
Sorry for the long answer. There are many reasons why you have the option to do things synchronously or not, but hopefully this will help you decide whether either method is best for you.

The benefit of the asynchronous version is that you can do other stuff while you wait for the IO to complete.
fs.readFile('file.html', function (err, data) {
if (err) throw err;
console.log(data);
});
// Do a bunch more stuff here.
// All this code will execute before the callback to readFile,
// but the IO will be happening concurrently. :)

You want to use the async version when you are writing event-driven code where responding to requests quickly is paramount. The canonical example for Node is writing a web server. Let's say you have a user making a request which is such that the server has to perform a bunch of IO. If this IO is performed synchronously, the server will block. It will not answer any other requests until it has finished serving this request. From the perspective of the users, performance will seem terrible. So in a case like this, you want to use the asynchronous versions of the calls so that Node can continue processing requests.
The sync version of the IO calls is there because Node is not used only for writing event-driven code. For instance, if you are writing a utility which reads a file, modifies it, and writes it back to disk as part of a batch operation, like a command line tool, using the synchronous version of the IO operations can make your code easier to follow.

Related

Are Database connections asynchronous in Node?

Node is single-threaded, but there are a lot of functions(modules like http, fs) that allow us to do a background task and the event loop takes care of executing the callbacks.
However, is this true for a database connection?
Let's say I have the following code.
const mysql = require('mysql');
function callDatabase(id) {
var result;
var connection = mysql.createConnection(
{
host : '192.168.1.14',
user : 'root',
password : '',
database : 'test'
}
);
connection.connect();
var queryString = 'SELECT name FROM test WHERE id = 1';
connection.query(queryString, function(err, rows, fields) {
if (err) throw err;
for (var i in rows) {
result = rows[i].name;
}
connection.end();
return result;
});
}
Does, mysql.createConnection, connection.connect, connection.query, connection.end spin up a new thread to execute in the background, leaving Node to run the remaining synchronous code?
If yes, in what queue will the callback be enqueued and how to write this sort of code such that a background task is initiated.
Anything that may be blocking (file system operations, network connections, etc) are generally asynchronous in Node, in order to avoid blocking on the main thread. That these functions take a parameter for a callback function is a sure hint that you have asynchronous operations (or "background tasks") in progress.
You don't show it in your sample code, but connect() and end() do take callback functions so you know when a connection is actually made or ends. It looks like the mysql library, however, also maintains an internal queue to make sure you can't attempt a query until a connection has been made and that only one operation at a time can be executed.
Note that createConnection() does not have a callback function. All it does is create a new data structure (connection) that gets used. It doesn't do any I/O itself, so doesn't need to run asynchronously.
Also note that you don't generally "spin up" your own threads. Node takes care of this thread management for you (largely by running things on the main worker thread), for the most part, and hides how threads themselves work for most developers. You typically hear that Node is "single threaded", and you should treat it this way.
Modern Node code makes extensive use of async/await and Promises to do this sort of thing. Slightly older code uses callback functions. Even older code uses Node events. In reality - if you dig far enough down, they're all using events and possibly presenting the simplified (more modern) interfaces.
The mysql module appears to date from the "callback" era and hasn't yet been updated for Promises/async/await. Under the covers, as noted, it uses Node events to track network (or unix domain socket) connections and transfers.

Non blocking Loop in Node.js and pooling?

I'm starting to play around with node.js and I have an application which basically iterating over dozens thousands of object and performing some various asynchronous http requests for all of them and populate the object with various data returned from the http requests..
This question is more concerning best practices with Node.js, non blocking operations and probably related to pooling.
Forgive me If I'm using the wrong term, as I'm new to this and please don't hesitate to correct me.
So below is a brief summary of the code
I have got a loop which kind of doing iterating over thousands
//Loop briefly summarized
for (var i = 0; i < arrayOfObjects.length; i++) {
do_something(arrayOfObjects[i], function (error, result){
if(err){
//various log
}else{
console.log(result);
}
});
}
//dosomething briefly summarized
function do_something (Object, callback){
http.request(url1, function(err, result){
if(!err){
insert_in_db(result.value1, function (error,result){
//Another http request with asynchronous
});
}else{
//various logging error
}
});
http.request(url2, function(err, result){
//some various logic including db call
});
}
In reality in do_something there is a complex logic but it's not really the matter right now
So my problem are the following
I think the main issue is in my loop is not really optimized because it's kind of a blocking event.
So the first http request results within dosomething are avaialble are after the loops is finished processing and then it's cascading.
If there a way somehow to make kind of pool of 10 or 20 max simualtenous execution of do_something and the rest are kind of queued when a pool ressource is available?
I hope I clearly explained myself , don't hesitate to ask me if I need to details.
Thanks in advance for your feedbacks,
Anselme
Your loop isn't blocking, per se, but it's not optimal. One of the things it does is schedules arrayOfObjects.length http requests. Those requests will all be scheduled right away, as your loop progresses.
In older versions of node.js, you would have had the benefit of default of 5 concurrent requests per host, but that default is later changed.
But then the actual opening of sockets, sending requests, waiting for responses, this will be individual for each loop. And each entry will finish in it's own time (depending, in this case, on the remote host, or e.g. database response times etc).
Take a look at async, vasync, or some of it's many alternatives, as suggested in comments, for pooling.
You can take it even a step further and use something like Bluebird Promise.map, with concurrency option set, depending on your use case.

Synchronous NodeJs (or other serverside JS) call

We are using Node for developing and 95% of code is Async, working fine.
For some 5% (one small module), which is sync in nature [and depends on other third party software],
and we are looking for
1. "Code to block until call back is finished"
2. At a time only one instance of function1 + its callback should be executed.
PS 1: I do completely agree, Node is for async work, We should avoid that, but this is separate non-realtime process.
PS 2: If not with Node any other Serverside JS framework? Last option is to use other lang like python, but if anything in JS possible, we are ready to give it a shot!
SEQ should solve your problem.
For an overview about sync modules please look at http://nodejsrocks.blogspot.de/2012/05/how-to-avoid-nodejs-spaghetti-code-with.html
Seq()
.seq(function () {
mysql.query("select * from foo",[], function(err,rows,fields) {
this(null, rows);
});
})
.seq(function(mysqlResult) {
console.log("mysql callback returnes:"+mysqlResult);
})
There are lots and lots of options, look at node-async, kaffeine async support, IcedCoffeescript, etc.
I want to make a plug for IcedCoffeeScript since I'm its maintainer. You can get by with solutions like Seq, but in general you'll wind up encoding control flow with function calls. I find that approach difficult to write and maintain. IcedCoffeeScript makes simple sequential operations a breeze:
console.log "hello, just wait a sec"
await setTimeout defer(), 100
console.log "ok, what did you want"
But more important, it handles any combination of async code and standard control flow:
console.log "Let me check..."
if isRunningLate()
console.log "Can't stop now, sorry!"
else
await setTimeout defer(), 1000
console.log "happy to wait, now what did you want?"
resumeWhatIWasDoingBefore()
Also loops work well, here is serial dispatch:
for i in [0...10]
await launchRpc defer res[i]
done()
And here is parallel dispatch:
await
for i in [0...10]
launchRpc defer res[i]
done()
Not only does ICS make sequential chains of async code smoother, it also encourages you to do as much as possible in parallel. If you need to change your code or your concurrency requirements, the changes are minimal, not a complete rewrite (as it would be in standard JS/CS or with some concurrency libraries).

What is a good way to exit a node.js script after "everything has been done"

My node.js script reads rows from a table in database 1, does some processing and writes the rows to database 2.
The script should exit after everything is done.
How can I know if everything has been done and exit node then?
If I have a callback function like this:
function exit_node() {
process.exit();
}
(Edit: in the meantime it became obvious that process.exit() could be also replaced with db.close() - but this is not the question what to put in there. The question is at which time exactly to do this, i.e. how and where to execute this callback.)
But it is not easy to attach it somewhere. After the last read from db1 is not correct, because the processing and writing still has to happen.
Attaching it to the write to db2 is not easy, because it has to be attached after the last write, but each write is indepenent and does not know if it is the last write.
It could also theoretically happen, that the last write finished, but another write before that is still executing.
Edit: Sorry, I can see the explanation of the question is not complete and probably confusing, but some people still understood and there are good answers below. Please continue reading the comments and the answers and it should give you the whole picture.
Edit: I could think of some "blocking" controller mechanism. The different parts of the script add blockers to the controller for each open "job" and release them after the job is finished, and the controller exits the script when no more bockers are present. Maybe async could help: https://github.com/caolan/async
I also fear this would blow up the code and the logic unreasonable.
JohnnyHK gives good advice; except Node.js already does option 2 for you.
When there is no more i/o, timers, intervals, etc. (no more work expected), the process will exit. If your program does not automatically exit after all its work is done, then you have a bug. Perhaps you forgot to close a DB connection, or to clearTimeout() or clearInterval(). But instead of calling process.exit() you might take this opportunity to identify your leak.
The two main options are:
Use an asynchronous processing coordination module like async (as you mentioned).
Keep your own count of outstanding writes and then exit when the count count reaches 0.
Though using counter could be tempting (simple, easily implementable idea), it will pollute your code with unrelated logic.
db1.query(selector, function(err, results) {
db1.close();
// let's suppose callback gets array of results;
// once all writes are finished, `db2.close` will be called by async
async.forEach(results, processRow, db2.close);
});
// async iterator
function processRow(row, cb) {
// modify row somehow
var newRow = foo(row);
// insert it in another db;
// once write is done, call `cb` to notify async
db2.insert(newRow, cb);
}
Although, using process.exit here feels like C++ new without delete for me. Bad comparison maybe, but can't help it :)

"Spawn a thread"-like behaviour in node.js

I want to add some admin utilities to a little Web app, such as "Backup Database". The user will click on a button and the HTTP response will return immediately, although the potentially long-running process has been started in the background.
In Java this would probably be implemented by spawning an independent thread, in Scala by using an Actor. But what's an appropriate idiom in node.js? (code snippet appreciated)
I'm now re-reading the docs, this really does seem a node 101 question but that's pretty much where I am on this...anyhow, to clarify this is the basic scenario :
function onRequest(request, response) {
doSomething();
response.writeHead(202, headers);
response.end("doing something");
}
function doSomething(){
// long-running operation
}
I want the response to return immediately, leaving doSomething() running in the background.
Ok, given the single-thread model of node that doesn't seem possible without spawning another OS-level ChildProcess. My misunderstanding.
In my code what I need for backup is mostly I/O based, so node should handle that in a nice async fashion. What I think I'll do is shift the doSomething to after the response.end, see how that behaves.
As supertopi said you could have a look at Child process. But I think it will hurt the performance of your server, if this happens a lot sequentially. Then I think you should queue them instead. I think you should have a look at asynchronous message queues to process your jobs offline(distributed). Some(just to name two) example of message queues are beanstalkd, gearman.
I don't see the problem. All you need to do is have doSomething() start an asynchronous operation. It'll return immediately, your onRequest will write the response back, and the client will get their "OK, I started" message.
function doSomething() {
openDatabaseConnection(connectionString, function(conn) {
// This is called some time later, once the connection is established.
// Now you can tell the database to back itself up.
});
}
doSomething won't just sit there until the database connection is established, or wait while you tell it to back up. It'll return right away, having registered a callback that will run later. Behind the scenes, your database library is probably creating some threads for you, to make the async work the way it should, but your code doesn't need to worry about it; you just return right away, send the response to the client right away, and the asynchronous code keeps running asynchronously.
(It's actually more work to make this run synchronously -- you would have to pass your response object into doSomething, and have doSomething do the response.end call inside the innermost callback, after the backup is done. Of course, that's not what you want to do here; you want to return immediately, which is exactly what your code will do.)

Resources