I was under the impression that I/O in node (in particular fs's file I/O) is carried out by using native async (i.e., event-based) APIs of the operating system. However the following comment (taken from https://nodejs.org/api/fs.html#threadpool-usage) suggests otherwise:
The promise APIs use the underlying Node.js threadpool to perform file system operations off the event loop thread. These operations are not synchronized or threadsafe. Care must be taken when performing multiple concurrent modifications on the same file or data corruption may occur.
The quoted text brings the following hypothesis: fs uses a blocking I/O API of the operating system (hence the need for a thread-pool), which, in turn, raises this concern/question:
In node can an I/O-intensive program get blocked/timeout/error-out due to lack of enough threads?
Related
I was going through Node.js documentation, and couldn't understand the line :
A Node.js app runs in a single process, without creating a new thread for every request. Node.js provides a set of asynchronous I/O primitives in its standard library that prevent JavaScript code from blocking and generally, libraries in Node.js are written using non-blocking paradigms, making blocking behavior the exception rather than the norm.
Source : Introduction to node js
I couldn't understand specifically:
[...] Node.js provides a set of asynchronous I/O primitives in its standard library that prevent JavaScript code from blocking [..]
Does it simply means, it has built in functionality that provides the provision to work as asynchronous?
If not then what are these set of asynchronous I/O primitives? If anyone could provide me some link for better understanding or getting started with Node.js, that would be really great.
P.S : I have practical experience with Nodejs where I understand how it's code will work but don't understand why it will work, so I want understand its theoretical part, so I can understand what actually is going on in the background.
Does it simply means, it has built in functionality that provides the provision to work as asynchronous?
Yes, that's basically what it means.
In a "traditional" one-thread-per-connection* model you accept a connection and then hand off the handling of that request to a thread (either a freshly started one or from a pool, doesn't really change much) and do all work related to that connection on that one thread, including sending the response.
This can easily done with synchronous/blocking I/O: have a read method that simply returns the read bytes and a write method that blocks until the writing is done.
This does, however mean that the thread handling that request can not do anything else and also that you need many threads to be able to handle many concurrent connections/requests. And since I/O operations take a relatively huge time (when measured in the speed of memory access and computation), that means that most of these threads will be waiting for one I/O operation or another for most of the time.
Having asynchronous I/O and an event-based core architecture means that when a I/O operation is initiated the CPU can immediately go on to processing whatever action needs to be done next, which will likely be related to an entirely different request.
Therefore one can handle many more requests on a single thread.
The "primitives" just means that basic I/O operations such as "read bytes" and "write bytes" to/from network connections or files are provided as asynchronous operations and higher-level operations need to be built on top of those (again in an asynchronous way, to keep the benefits).
As a side node: many other programming environments have either had asynchronous I/O APIs for a long time or have gotten them in recent years. The one thing that sets Node.js apart is that it's the default option: If you're reading from a socket or a file, then doing it asynchronously is what is "normal" and blocking calls are the big exception. This means that the entire ecosystem surrounding Node.js (i.e. almost all third-party libraries) works with that assumption in mind and is also written in that same manner.
So while Java, for example, has asynchronous I/O you lose that advantage as soon as you use any I/O related libraries that only support blocking I/O.
* I use connection/request interchangeably in this answer, under the assumption that each connection contains a single request. That assumption is usually wrong these days (most common protocols allow multiple request/response pairs in a single connnection), but handling multiple requests on a single connection doesn't fundamentally change anything about this answer.
It means node js doesn't halts on input/output operations. Suppose you need to do some task and it have some blocking condition e.g if space key is pressed then do this or while esc key isn't pressed keep taking input as node js is single threaded this will stop all operations and it will focus of doing the blocking condition job until its finishes it asyncronous will allow application not halt other tasks while doing one it will do other task until than task finishes and thats why we use await to get value out of promises in node js async function when the data is processed then the pointer will comes to the line where await line is present and get the value or process the information from it.
My understanding is that the hardware architecture and the operating systems are designed not to block the cpu. When any kind of blocking operation needs to happen, the operating system registers an interruption and moves on to something else, making sure the precious time of the cpu is always effectively used.
It makes me wonder why most programming languages were designed with blocking APIs, but most importantly, since the operating system works in an asynchronous way when it comes to IO, registering interruptions and dealing with results when they are ready later on, I'm really puzzled about how our programming language APIs escape this asynchrony. How does the OS provides synchronous system calls for our programming language using blocking APIs?
Where this synchrony comes from? Certainly not at the hardware level. So, is there an infinite loop somewhere I don't know about spinning and spinning until some interruption is triggered?
My understanding is that the hardware architecture and the operating systems are designed not to block the cpu.
Any rationally designed operating system would have a system service interface that does what you say. However, there are many non-rational operating systems that do not work in this manner at the process level.
Blocking I/O is simpler to program than non-blocking I/O. Let me give you an example from the VMS operating system (Windoze works the same way under the covers). VMS has a system services called SYS$QIO and SYS$QIOW. That is, Queue I/O Request and Queue I/O Request and wait. The system services have identical parameters. One pair of parameters is the address of a completion routine and a parameters to that routine. However, these parameters are rarely used with SYS$QIOW.
If you do a SYS$QIO call, it returns immediately. When the I/O operation completes, the completion routine is called as a software interrupt. You then have to do interrupt programming in your application. We did this all the time. If you want your application to be able to read from 100 input streams simultaneously, this is the way you had to do it. It's just more complicated than doing simple blocking I/O with one device.
If a programming language were to incorporate such a callback system into its I/O statements, it would be mirroring VMS/RSX/Windoze. Ada uses the task concept to implement such systems in a operating-system-independent manner.
In the Eunuchs world, it was traditional to create a separate process for each device. That was simpler until you had to read AND write to each device.
Your observations are correct - the operating system interacts with the underlying hardware asynchronously to perform I/O requests.
The behavior of blocking I/O comes from threads. Typically, the OS provides threads as an abstraction for user-mode programs to use. But sometimes, green/lightweight threads are provided by a user-mode virtual machine like in Go, Erlang, Java (Project Loom), etc. If you aren't familiar with threads as an abstraction, read up on some background theory from any OS textbook.
Each thread has a state consisting of a fixed set of registers, a dynamically growing/shrinking stack (for function arguments, function call registers, and return addresses), and a next instruction pointer. The implementation of blocking I/O is that when a thread calls an I/O function, the underlying platform hosting the thread (Java VM, Linux kernel, etc.) immediately suspends the thread so that it cannot be scheduled for execution, and also submits the I/O request to the platform below. When the platform receives the completion of the I/O request, it puts the result on that thread's stack and puts the thread on the scheduler's execution queue. That's all there is to the magic.
Why are threads popular? Well, I/O requests happen in some sort of context. You don't just read a file or write a file as a standalone operation; you read a file, run a specific algorithm to process the result, and issue further I/O requests. A thread is one way to keep track of your progress. Another way is known as "continuation passing style", where every time you perform an I/O operation (A), you pass a callback or function pointer to explicitly specify what needs to happen after I/O completion (B), but the call (A) returns immediately (non-blocking / asynchronous). This way of programming asynchronous I/O is considered hard to reason about and even harder to debug because now you don't have a meaningful call stack because it gets cleared after every I/O operation. This is discussed at length in the great essay "What color is your function?".
Note that the platform has no obligation to provide a threading abstraction to its users. The OS or language VM can very well expose an asynchronous I/O API to the user code. But the vast majority of platforms (with exceptions like Node.js) choose to provide threads because it's much easier for humans to reason about.
Does Charm++ support File Handling? I mean can we perform file operation(read/write) in Charm++? If yes, please give an simple example of File Handling for better understanding.
You can do any kind of file I/O in Charm++, though you may have to take care to properly synchronize parallel file accesses (if doing parallel I/O, say, from all elements of a chare array). The options for doing I/O are essentially:
1) Do I/O from a dedicated object. You can reduce and broadcast data to and from that object, and use any serial I/O method you want. Since Charm++ is built-on a message-driven execution paradigm, the I/O object will only be scheduled when it actually has work to do.
2) Do I/O from all objects. You can either use Charm++'s built-in asynchronous parallel I/O library "CkIO" directly from chare array elements, or you can use MPI-IO, HDF5, or whatever other parallel I/O library you want. To do the latter, you will want to use Charm++'s MPI interoperability features and do the I/O from a Charm++ "Group" or "Node Group" so that there is one I/O actor per PE or node.
Of course you can also do I/O from a subset of all objects, and you have the choice of using a single global file or one file per PE/node.
To see an example of CkIO's usage, have a look at tests/charm++/io/ in the Charm++ source. An example of MPI interop is in examples/charm++/mpi-coexist/.
What are the best resources to learn Express.js? Can anybody explain the node.js framework,how exactly it works.
The nonblocking eventloop concept.
I've found the Express website explains things pretty well, and Express to be quite approachable for new users.
A multi-threaded system (Java and underlying JVM, for instance), contains many threads of execution that may each execute its own code instructions simultaneously (on a multi-core hardware CPU), or switched between, where each thread runs for a scheduled period of time, and then the OS schedules the next thread for execution.
Node programs are executed in the Node environment, which is single threaded, so there is only a single thread of code execution for the entire program and no multiple threads executing concurrently.
A simple analogy would be comparing the event loop with a standard programming construct, the while loop, which is exactly what it is.
while(1){
// Node sets this up. Do stuff.. Runs until our program terminates.
}
Starting a node program would start this loop. You could imagine your program being inserted into this loop.
If the first instruction in your program was to read a file from disk. That request would be dispatched to the underlying OS system call to read the file.
Node provides Asynchronous and Synchronous functions for things like reading a file, although the asynchronous is generally preferred because in a synchronous call, a problem with reading the file halts the entire program, in a single threaded system.
while(1){
require('fs').readFileSync('file.txt');
// stop everything until the OS reports the file has been read
}
In the (preferred) asynchronous version, the request to read the file is issued to the OS, and a callback function is specified, the loop continues. The program essentially waits for the OS to respond, and on the next loop (aka tick), your provided callback function (essentially just a location in memory) is called by the system with the result.
while(1){
// 1st loop does this
require('fs').readFile('file.txt', callback);
// 2nd loop does this, system calls our callback function with the result
callback(err, result)
}
There are anticipated advantages of a single threaded system. One is that there is no context switching between threads that needs to be done by the OS, which removes the overhead of performing that task in the system.
The other, and this is a hotly debated topic of how this compares against the way other systems and programming languages handle it - is the simplicity of programming using callback functions as a means to implement asynchronicity.
There are many good resources to learn Express.js e.g.:
http://shop.oreilly.com/product/0636920032977.do
https://www.udemy.com/all-about-nodejs/
https://www.manning.com/books/express-in-action
https://www.packtpub.com/web-development/mastering-web-application-development-express
http://expressjsguide.com/
https://github.com/azat-co/expressworks
You may want to check also these blogs:
https://codeforgeek.com/2014/10/express-complete-tutorial-part-1/
https://strongloop.com/strongblog/category/express/
I am trying to understand in which cases nodejs will be faster than its pros. I completely understood the terms Asynchronous I/O and non-blocking I/O but cant think of a use case where it is be useful. can somebody give me an example ?
Node is a prime example why async I/O is useful.
Node is (as far as the user is concerned) single threaded, so waiting for synchronous I/O would stop the only thread that is executing code. Since there are no guarantees how long I/O will take, that could/would make Node code run extremely slowly.
That's why Node pretty much uses only async I/O, it allows the single thread to quickly offload I/O work to the operating system while continuing code execution without interruption until the operating system notifies Node that the I/O operation is done.
NodeJS is basically a server side coding which is based on single threaded concept, so we have to manage all the I/O and CPU works on this thread itself.
We know, I/O operations are basic blocking operation for the running thread (example: I/O operation may include getting an input from a user, or reading a large file from the data center; this operations may hang up the thread for sometime, which may results in hanging up many client request).
To avoid this above cases, NodeJS came up with the concept of single thread asynchronous non-blocking I/O (also overcoming the overhead of creating multiple threads in case of multi-threading).