NodeJS Synchronous IO Implementation - node.js

Node.JS has synchronous versions for file operations:
fs.writeFileSync(file, data, ...)
According to this blog, the underlying OS call is still asynchronous (verified with DTrace), and all the sync version does is "block the event loop".
What does it mean to block the event loop (in purpose)? Is it something like continuous setImmediate() or something more low level?

What does it mean to block the event loop (in purpose)?
It just means v8 doesn't run any userland javascript code while waiting for the IO to complete. Normally v8 would be executing javascript while waiting for the IO, both your javascript and any libraries your application is using, but in this case it doesn't. One way to think of it is your entire program is paused while the IO happens, whereas normally your program continues to execute.

Related

Node.js: why are idle and prepare phases only used internally?

The documentation of node.js describes the so called phases of its underlying event loop.
It explicitly states also that idle and prepare phases are only used internally.
For the event loop of node.js is the one of libuv, it goes without saying that those phases are probably mapped on the idle and prepare handles of libuv.
They would help to have greater granularity while organizing the tasks in a software. In particular, they are the only way to schedule something between the execution of the I/O callbacks and the poll phase.
Anyway, they are not exported from the underlying environment.
What's the reason for which those phases have been forbidden, actually giving to the users an apparently poorest event loop than the one offered by libuv?
Is there any other way to schedule tasks the way mentioned above?
Side note: it's just curiosity.
I used to work with both libuv and nodejs and I noticed it, so I want to know if there is a technical reason for that or... Well, that is how it has been designed and that's all, no particular reason.
I don't think there is a specific reason to "forbid" them. Moreover, they are not really forbidden, they are just not exposed. You could create a Node addon which allows you to create idle and prepare handles and there would be no problem at all. There are some things you must be aware of:
Idle handles have a terrible name: they don't run when the loop is actually idle. They run once per loop iteration, after the timers, and if any idle timer is active, the loop will block for i/o for zero seconds. So they can be dangerous because the CPU will spin if you don't stop it.
Callbacks registered with process.nextTick are called when the C++ <-> JS boundary is crossed (see calls to MakeCallback) so i/o callbacks could be deferred and run a bit later. If you exposed prepare handles to JS you would use MakeCallback in the C++ code, so some of the process.nextTick callbacks would also be called alongside your prepare callbacks.
As a general note: idle, check and prepare handles were somehow inherited from libev (which libuv used to use internally). Check and prepare can be used when embedding libuv with other libraries and idle handles are a bit weird, as I mentioned above. Also, libuv follows its own path these days, so not everything libuv has will end up exposed in Node land.
You could ask a reverse question "why do you need idle phase, for example, to be exposed"? You can just use setImmediate().
Also, why do you want to execute something in between I/O callbacks and polling phase, as you don't control explicitly those things anyways?

Node.js framework & Express.js

What are the best resources to learn Express.js? Can anybody explain the node.js framework,how exactly it works.
The nonblocking eventloop concept.
I've found the Express website explains things pretty well, and Express to be quite approachable for new users.
A multi-threaded system (Java and underlying JVM, for instance), contains many threads of execution that may each execute its own code instructions simultaneously (on a multi-core hardware CPU), or switched between, where each thread runs for a scheduled period of time, and then the OS schedules the next thread for execution.
Node programs are executed in the Node environment, which is single threaded, so there is only a single thread of code execution for the entire program and no multiple threads executing concurrently.
A simple analogy would be comparing the event loop with a standard programming construct, the while loop, which is exactly what it is.
while(1){
// Node sets this up. Do stuff.. Runs until our program terminates.
}
Starting a node program would start this loop. You could imagine your program being inserted into this loop.
If the first instruction in your program was to read a file from disk. That request would be dispatched to the underlying OS system call to read the file.
Node provides Asynchronous and Synchronous functions for things like reading a file, although the asynchronous is generally preferred because in a synchronous call, a problem with reading the file halts the entire program, in a single threaded system.
while(1){
require('fs').readFileSync('file.txt');
// stop everything until the OS reports the file has been read
}
In the (preferred) asynchronous version, the request to read the file is issued to the OS, and a callback function is specified, the loop continues. The program essentially waits for the OS to respond, and on the next loop (aka tick), your provided callback function (essentially just a location in memory) is called by the system with the result.
while(1){
// 1st loop does this
require('fs').readFile('file.txt', callback);
// 2nd loop does this, system calls our callback function with the result
callback(err, result)
}
There are anticipated advantages of a single threaded system. One is that there is no context switching between threads that needs to be done by the OS, which removes the overhead of performing that task in the system.
The other, and this is a hotly debated topic of how this compares against the way other systems and programming languages handle it - is the simplicity of programming using callback functions as a means to implement asynchronicity.
There are many good resources to learn Express.js e.g.:
http://shop.oreilly.com/product/0636920032977.do
https://www.udemy.com/all-about-nodejs/
https://www.manning.com/books/express-in-action
https://www.packtpub.com/web-development/mastering-web-application-development-express
http://expressjsguide.com/
https://github.com/azat-co/expressworks
You may want to check also these blogs:
https://codeforgeek.com/2014/10/express-complete-tutorial-part-1/
https://strongloop.com/strongblog/category/express/

What is the difference between single thread and non-blocking I/O operation in NodeJs?

I've been reading up and going through as much of NodeJs code as I can but I'm a bit confused about this:
What exactly does Node being single threaded mean and what does non-blocking I/O mean? I can achieve the first one by spawning a child process and the second one by using async library. But I wanted to be clear what it meant and how non-blocking I/O can still slow up your app.
I'll try my best to explain.
Single-threaded means that the Node.js Javascript runtime - at a particular point in time - only is executing one piece of code from all the code it has loaded. In effect, it starts somewhere, and works it way down through all instructions (the call stack) until it's done. While it's executing code, nothing can interrupt this process, and all I/O must wait. Thankfully, most call stacks are relatively short, and lots of things we do in Node.js are more of the "bookkeeping" type than CPU-heavy.
Being single-thread though, any instructions that would take a long time would be a huge problem for the responsiveness in a system. The runtime can only do one thing at a time, so everything must wait until that instruction has finished. If any "I/O" instruction (say reading from disk) would block execution, then the system would be unnecessarily unavailable at that time.
But thankfully, we've got non-blocking I/O.
Instead of waiting for a file to be read:
console.log(readFileSync(filePath))
you write your code so that you DON'T wait for a file to be read:
readFile(filePath)
The readFile call returns almost instantly (perhaps in a a few nano-seconds), so the runtime can continue executing instructions that come next. But if the readFile call returns before the data has been read, there's no way that the the readFile call can return the file contents. That's where callbacks come in:
readFile(filePath, function(err, contents) { console.log(contents))
Still, the readFile call returns almost instantly. The runtime can continue. It will finish the current work before it (all instructions coming after readFile). Nothing is done with the function that's passed, other than storing a reference to it.
Then, at some later point in time (perhaps 10ms, 100ms, or 1000ms later) when reading the file has completed, the callback is called with as second argument the full contents of the file. Before that time, any number of other batches of work could have been done by the runtime.
Now I will address your comments about spawning child processes and Async library. You are wrong on both accounts.
Spawning a child process is a way to let Node.js use more than CPU core. Being single-threaded, a single Node.js has no purpose for using more than one core. Still, if you are on a multi-core computer, you may want to use all those cores. Hence, start multiple Node.js. processes.
The Async library will not give you non-blocking I/O, Node.js gives you that. What Node.js does not give you itself, is an easy way to deal with data coming in from multiple callbacks. The Async library can help a great deal with that.
As I'm not an expert on Node.js internals, I welcome corrections!
Related questions:
asynchronous vs non-blocking
What's the difference between: Asynchronous, Non-Blocking, Event-Base architectures?

Does Node.js actually use multiple threads underneath?

After all the literature i've read on node.js I still come back to the question, does node.js itself make use of multiple threads under the hood? I think the answer is yes because if we use the simple asynch file read example something has to be doing the work to read the file but if the main event loop of node is not processing this work than that must mean there should be a POSIX thread running somewhere that takes care of the file reading and then upon completion places the call back in the event loop to be executed. So when we say Node.js runs in one thread do we actually mean that the event loop of node.js is only one thread? Or am i missing something here.....
To a Javascript program on node.js, there is only one thread.
If you're looking for technicalities, node.js is free to use threads to solve asynchronous I/O if the underlying operating system requires it.
The important thing is to never break the "there is only one thread" abstraction to the Javascript program. If there are more threads, all they can do is queue up work for the main thread in the Javascript program, they can never execute any Javascript code themselves.

What is the difference between asynchronous I/O and asynchronous function?

Node.js, is an asynchronous I/O. what does this actually mean?
Is there different between I create an async function by spawning another thread to do the process?
e.g.
void asyncfuntion(){
Thread apple = new Thread(){
public void run(){
...do stuff
}
}
apple.start()
}
If there is difference, can I do a asynchronous I/O in javascript?
Asynchronous I/O
Asynchronous I/O (from Wikipedia)
Asynchronous I/O, or non-blocking I/O, is a form of input/output
processing that permits other processing to continue before the
transmission has finished.
What this means is, if a process wants to do a read() or write(), in a synchronous call, the process would have to wait until the hardware finishes the physical I/O so that it can be informed of the success/failure of the I/O operation.
On asynchronous mode, once the process issues a read/write I/O asynchronously, the system calls is returned immediately once the I/O has been passed down to the hardware or queued in the OS/VM. Thus the execution of the process isn't blocked (hence why it's called non-blocking I/O) since it doesn't need to wait for the result from the system call, it will receive the result later.
Asynchronous Function
Asynchronous functions is a function that returns the data back to the caller by a means of event handler (or callback functions). The callback function can be called at any time (depending on how long it takes the asynchronous function to complete). This is unlike the synchronous function, which will execute its instructions before returning a value.
...can I do a asynchronous I/O in java?
Yes, Java NIO provides non-blocking I/O support via Selector's. Also, Apache MINA, is a networking framework that also includes non-blocking I/O. A related SO question answers that question.
There are several great articles regarding asynchronous code in node.js:
Asynchronous Code Design with Node.js
Understanding Event-driven Programming
Understanding event loops and writing great code for Node.js
In addition to #The Elite Gentleman's answer, node doesn't spawn threads for asynchronous I/O functions. Everything under node runs in a single threaded event loop. That is why it is really important to avoid synchronized versions of some I/O functions unless absolutely necessary, like fs.readSync
You may read this excellent blog post for some insight: http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
I was investigating the same question since this async IO pattern was very new to me. I found this talk on infoq.com which made me very happy. The guy explains very well where async IO actually resides (the OS -> the kernel) and how it is embedded in node.js as the primary idiom for doing IO. Enjoy!
http://www.infoq.com/presentations/Nodejs-Asynchronous-IO-for-Fun-and-Profit
node.js enables a programmer to do asynchronous IO by forcing the usage of callbacks. Now callbacks are similar to the old asynchronous functions that we have been using for a long time to handle DOM events in javascript!
e.g.
asyncIOReadFile(fileName, asynFuncReadTheActualContentsAndDoSomethingWithThem);
console.log('IDontKnowWhatIsThereInTheFileYet')
function asynFuncReadTheActualContentsAndDoSomethingWithThem(fileContents) {
console.log('Now I know The File Contents' + fileContents)
}
//Generally the output of the above program will be following
'IDontKnowWhatIsThereInTheFileYet'
//after quite a bit of time
'Now I know The File Contents somebinarystuff'
From https://en.wikipedia.org/wiki/Asynchronous_I/O
Synchronous blocking I/O
A simple approach to I/O would be to start the access and then wait for it to complete.
Would block the progress of a program while the communication is in progress, leaving system resources idle.
Asynchronous I/O
Alternatively, it is possible to start the communication and then perform processing that does not require that the I/O be completed.
Any task that depends on the I/O having completed ... still needs to wait for the I/O operation to complete, and thus is still blocked,
but other processing that does not have a dependency on the I/O operation can continue.
Example:
https://en.wikipedia.org/wiki/Node.js
Node.js has an event-driven architecture capable of asynchronous I/O

Resources