Many people are saying that modern rest apis should be "async", and as a main argument they say that on some platforms, for example in Java, "blocking" way of doing things produce many threads and "async" way allows to limit thread count and overhead.
What I don't understand, is how it is achieved.
Consider I have an app in a framework like vert.x (but actually it doesn't matter, you can think of NodeJS as well), and say 1_000_000 concurrent connections for a service which makes some request to a database. The framework allows each request itself to be processed async on the long task i|o operations, so database data exchange looks syntactically asynchronous in the business logic code. BUT. As I understand, DB request is made not in the vacuum - it is processed in some other thread, and that thread actually blocks until db request is finished. So it means, that despite the fact, that request business logic looks async and non blocking, long time operations which are called from such logic are actually blocking somewhere under the hood of framework and the more such operations are done, the more threads should be consumed anyway (for NodeJS you can think of threads, created in C++ code of a framework itself)
So as I see the big picture - in async approach there is only one thread, which processes all the requests, it's ok, but there is a bunch of threads, which are doing the actual I/O work in the background anyway, and if one doesn't limit their count, then the number of threads will be the same as for a blocking approach + 1. On the other hand if you limit the number of background thread pool programmatically, then what will be the benefits compared to the blocking approach, which combines a queue for user requests and a limit for the number of request processing threads?

Since you're asking a fairly low level question I'll answer with a low level answer. Hope you're comfortable with C.
First, a disclaimer: I'll be talking mostly about networking code because the only widely used database I know of that use file I/O is sqlite. Since you're asking about postgres I can assume you're interested about how socket I/O (be it TCP socket or unix local sockets) can work with only one thread.
At the core of almost all async systems and libraries is a piece of code that looks like this:
while (1)
read_fd_set = active_fd_set;
// This blocks until we receive a packet or until timeout expires:
select(FD_SETSIZE, &read_fd_set, NULL, NULL, timeout);
// Process timed events:
timeout = process_timeout();
// Process I/O:
for (i = 0; i < FD_SETSIZE; ++i) {
if (FD_ISSET(i, &read_fd_set)) {
if (i == sock) {
/* Connection arriving on listening socket */
int new;
size = sizeof(clientname);
new = accept (sock,(struct sockaddr *) &clientname, &size);
FD_SET (new, &active_fd_set);
else {
/* Data arriving on an already-connected socket. */
if (read_from_client(i) < 0) {
close (i);
FD_CLR (i, &active_fd_set);
(code example paraphrased from a GNU socket programming example)
As you can see, the code above uses no threading whatsoever. Yet it can handle many connections simultaneously. If you take a look at the for loop it is also obvious that it is basically a simple state machine that processes sockets one at a time if they have any packets waiting to be read (if not it is skipped by the if (FD_ISSET...) statement).
Non-I/O events can logically only come from timed events. And that's where the timeout management (details not shown for clarity) comes in. All I/O related stuff (basically almost all your async code) gets called back from the read_from_client() function (again, details omitted for clarity).
There is zero code running in parallel.
Where does the parallelization come from?
Basically the server you're connecting to. Most databases support some form of parallelism. Some support mulththreading. Some even support node.js or vert.x style parallelism by supporting asynchronous disk I/O (like postgres). Some configurations of databases allow higher level of parallelism by storing data on more than one server via partitioning and/or sharding and/or master/slave servers.
That's where the big parallelism comes from -- parallel computing. Most databases have very strong support for read parallelism but weaker support for write parallelism (master/slave setups for example allow you to write only to the master database). But this is still a big win because most apps read more data than they write.
Where does disk parallelism come from?
The hardware. Mostly this has to do with DMA which can transfer data without the CPU. DMA is not one thing. It is more like a concept. Different systems like the PCI bus, SATA, USB even the CPU RAM bus itself has various kinds of DMA to transfer data directly to RAM (and in the case of RAM, to transfer data higher up to the various levels of CPU cache) or to a faster buffer.
While waiting for the DMA to complete. The CPU is not doing anything. And while it is doing nothing and there happens to be a network packet coming in or a setTimeout() expiring the code that handles them can be executed on the CPU. All while a file is being read into RAM.
But Node.js docs keep mentioning I/O threads
Only for disk I/O. It's not impossible to do async disk I/O with a single thread. Tcl has done that for years and many other programming languages and frameworks have too. It's just very-very messy since BSD does it differently form Linux which does it differently from Windows and even OSX may be subtly different form BSD even though it is derived from it etc. etc.
For the sake of simplicity and solid reliability node developers have opted to process disk I/O in separate threads.
Note that even for socket I/O it is not as simple as the code example I gave above. Since select() has some limitations (for example, you're forced to loop over ALL sockets to check for incoming data even though most won't have incoming data), people have come up with better APIs. And obviously different OSes do it differently. That is why there are a lot of libraries created to handle cross platform event processing like libevent and libuv (the one node.js uses).
OK. But postgres still runs on my PC
Asynchronous, event-oriented systems does not automagically give you performance superpowers. What they DO give you is choice: the app server is blazing fast so where you put your database servers and what database you use us up to you.
OK. But I can do this with threads. Why async?
Since 1999, many people have run many benchmarks and in the majority of cases single threaded (or low thread count), event-oriented systems have outperformed simple multithreaded systems. It was especially true in the old days of single CPU, single core servers. It is still partly true now (since cores are still limited).
That is why Apache was re-written into Apache2 to use a thread pool of async listeners and why Nginx was written from scratch to use a thread pool of async code.
Yes, on modern servers ideally you'd still want some threads in order to use all your CPUs. The alternative is a process pool like how the cluster module works in node.js. But you'd want the number of threads/processes to be constant or as constant as possible to avoid the overhead of context switching and thread creation.

This is true to some async frameworks where JDBC client is still synchronised.
When querying DB in Vert.x you reuse same application threads.
Please see the following example:
public void testMultipleThreads() throws InterruptedException {
Vertx vertx = Vertx.vertx();
System.out.println("Before starting server: " + Thread.activeCount());
// Start server
requestHandler(httpServerRequest -> {
// System.out.println("Request");
listen(8080, o -> {
System.out.println("Server ready");
// Start counting threads
vertx.setPeriodic(500, (o) -> {
// Create requests
HttpClient client = vertx.createHttpClient();
int loops = 1_000_000;
CountDownLatch latch = new CountDownLatch(loops);
for (int i = 0; i < loops; i++) {
client.getNow(8080, "localhost", "/", httpClientResponse -> {
// System.out.println("Response received");
You'll notice that the number of threads doesn't change, even though you serve as many connections as you would like. You can also add Vert.x JDBC client to test it.


I can imagine situation where 100 requests come to single Node.js server. Each of them require some DB interactions, which is implemented some natively async code - using task queue or at least microtask queue (e.g. DB driver interface is promisified).
How does Node.js return response when request handler stopped being sync? What happens to connection from api/web client where these 100 requests from description originated?
This feature is available at the OS level and is called (funnily enough) asynchronous I/O or non-blocking I/O (Windows also calls/called it overlapped I/O).
At the lowest level, in C (C#/Swift), the operating system provides an API to keep track of requests and responses. There are various APIs available depending on the OS you're on and Node.js uses libuv to automatically select the best available API at compile time but for the sake of understanding how asynchronous API works let's look at the API that is available to all platforms: the select() system call.
The select() function looks something like this:
int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, time *timeout);
The fd_set data structure is a set/list of file descriptors that you are interested in watching for I/O activity. And remember, in POSIX sockets are also file descriptors. The way you use this API is as follows:
// Pseudocode:
// Say you just sent a request to a mysql database and also sent a http
// request to google maps. You are waiting for data to come from both.
// Instead of calling `read()` which would block the thread you add
// the sockets to the read set:
add mysql_socket to readfds
add maps_socket to readfds
// Now you have nothing else to do so you are free to wait for network
// I/O. Great, call select:
select(2, &readfds, NULL, NULL, NULL);
// Select is a blocking call. Yes, non-blocking I/O involves calling a
// blocking function. Yes it sounds ironic but the main difference is
// that we are not blocking waiting for each individual I/O activity,
// we are waiting for ALL of them
// At some point select returns. This is where we check which request
// matches the response:
check readfds if mysql_socket is set {
then call mysql_handler_callback()
check readfds if maps_socket is set {
then call maps_handler_callback()
go to beginning of loop
So basically the answer to your question is we check a data structure what socket/file just triggered an I/O activity and execute the appropriate code.
You no doubt can easily spot how to generalize this code pattern: instead of manually setting and checking the file descriptors you can keep all pending async requests and callbacks in a list or array and loop through it before and after the select(). This is in fact what Node.js (and javascript in general) does. And it is this list of callbacks/file-descriptors that is sometimes called the event queue - it is not a queue per-se, just a collection of things you are waiting to execute.
The select() function also has a timeout parameter at the end which can be used to implement setTimeout() and setInterval() and in browsers process GUI events so that we can run code while waiting for I/O. Because remember, select is blocking - we can only run other code if select returns. With careful management of timers we can calculate the appropriate value to pass as the timeout to select.
The fd_set data structure is not actually a linked list. In older implementations it is a bitfield. More modern implementation can improve on the bitfield as long as it complies with the API. But this partly explains why there is so many competing async API like poll, epoll, kqueue etc. They were created to overcome the limitations of select. Different APIs keep track of the file descriptors differently, some use linked lists, some hash tables, some catering for scalability (being able to listen to tens of thousands of sockets) and some catering for speed and most try to do both better than the others. Whatever they use, in the end what is used to store the request is just a data structure that keeps tracks of file descriptors.

Currently I am working on a database that is updated by another java application, but need a NodeJS application to provide Restful API for website use. To maximize the performance of NodeJS application, it is clustered and running in a multi-core processor.
However, from my understanding, a clustered NodeJS application has a their own event loop on each CPU core, if so, does that mean, with cluster architect, NodeJS will have to face traditional concurrency issues like in other multi-threading architect, for example, writing to same object which is not writing protected? Or even worse, since it is multi-process running at same time, not threads within a process blocked by another...
I have been searching Internet, but seems nobody cares that at all. Can anyone explain the cluster architect of NodeJS? Thanks very much
Add on:
Just to clarify, I am using express, it is not like running multiple instances on different ports, it is actually listening on the same port, but has one process on each CPUs competing to handle requests...
the typical problem I am wondering now is: a request to update Object A base on given Object B(not finish), another request to update Object A again with given Object C (finish before first request)...then the result would base on Object B rather than C, because first request actually finishes after the second one.
This will not be problem in real single-threaded application, because second one will always be executed after first request...
The core of your question is:
NodeJS will have to face traditional concurrency issues like in other multi-threading architect, for example, writing to same object which is not writing protected?
The answer is that that scenario is usually not possible because node.js processes don't share memory. ObjectA, ObjectB and ObjectC in process A are different from ObjectA, ObjectB and ObjectC in process B. And since each process are single-threaded contention cannot happen. This is the main reason you find that there are no semaphore or mutex modules shipped with node.js. Also, there are no threading modules shipped with node.js
This also explains why "nobody cares". Because they assume it can't happen.
The problem with node.js clusters is one of caching. Because ObjectA in process A and ObjectA in process B are completely different objects, they will have completely different data. The traditional solution to this is of course not to store dynamic state in your application but to store them in the database instead (or memcache). It's also possible to implement your own cache/data synchronization scheme in your code if you want. That's how database clusters work after all.
Of course node, being a program written in C, can be easily extended in C and there are modules on npm that implement threads, mutex and shared memory. If you deliberately choose to go against node.js/javascript design philosophy then it is your responsibility to ensure nothing goes wrong.
Additional answer:
a request to update Object A base on given Object B(not finish), another request to update Object A again with given Object C (finish before first request)...then the result would base on Object B rather than C, because first request actually finishes after the second one.
This will not be problem in real single-threaded application, because second one will always be executed after first request...
First of all, let me clear up a misconception you're having. That this is not a problem for a real single-threaded application. Here's a single-threaded application in pseudocode:
function main () {
timeout = FOREVER
readFd = []
writeFd = []
databaseSock1 = socket(DATABASE_IP,DATABASE_PORT)
databaseSock2 = socket(DATABASE_IP,DATABASE_PORT)
while(1) {
event = select(readFD,writeFD,timeout)
if (event) {
for (i=0; i<length(readFD); i++) {
if (readable(readFD[i]) {
data = read(readFD[i])
if (data == OBJECT_B_UPDATED) {
if (data == OBJECT_C_UPDATED) {
As you can see, there's no threads in the program above, just asynchronous I/O using the select system call. The program above can easily be translated directly into single-threaded C or Java etc. (indeed, something similar to it is at the core of the javascript event loop).
However, if the response to UPDATE_OBJECT_C arrives before the response to UPDATE_OBJECT_B the final state would be that objectA is updated based on the value of objectB instead of objectC.
No asynchronous single-threaded program is immune to this in any language and node.js is no exception.
Note however that you don't end up in a corrupted state (though you do end up in an unexpected state). Multithreaded programs are worse off because without locks/semaphores/mutexes the call to update(objectA,objectB) can be interrupted by the call to update(objectA,objectC) and objectA will be corrupted. This is what you don't have to worry about in single-threaded apps and you won't have to worry about it in node.js.
If you need strict temporally sequential updates you still need to either wait for the first update to finish, flag the first update as invalid or generate error for the second update. Typically for web apps (like stackoverflow) an error would be returned (for example if you try to submit a comment while someone else have already updated the comments).

I am building a CPU intensive web app, where ill write the CPU intensive stuff in C++ while ill write the webserver in node.js. The node.js would be connected to c++ via addons. I am confused about one thing -
Say the time of the CPU intensive operation per request is 5 seconds(maybe this involved inverting a huge matrix). When this request comes through, the node.js binding to c++ would send this request over to the c++ code.
Now does this mean that node.js would not be caught up for the next 5 seconds and can continue serving other requests?
I am confused as i have heard that even though node offers asynchronous features, it is still single threaded.
Obviously I would not want node.js to be stuck up for 5s as it is a huge price to pay. Imagine 100s of requests simultaneously for this intensive operation..
Trying to understand JS callbacks and asynchronicity logic, i came across with many different versions of the following description;
a callback function which is passed to another function as a parameter, runs
following to the time taking process of the function it's passed to.
The dilemma gets originated with the "time taking" adjective. Such as is it
Time taking because of CPU being idle and waiting for a response?
Time taking because of CPU being busy with number crunching like hell?
This is not clear in the description and confused me. So i tried the following two codes.
getData('', writeData);
document.getElementById('output').innerHTML += "show this before data ...";
function getData(dataURI, callback) {
// Normally you would actually connect to a server here.
// We're just going to simulate a 3-second delay.
var timer = setTimeout(function () {
var dataArray = [123, 456, 789, 012, 345, 678];
}, 3000);
function writeData(myData) {
document.getElementById('output').innerHTML += myData;
<p id="output"></p>
getData('', writeData);
document.getElementById('output').innerHTML += "show this before data ...";
function getData(dataURI, callback) {
var dataArray = [123, 456, 789, 012, 345, 678];
for (i=0; i<1000000000; i++);
function writeData(myData) {
document.getElementById('output').innerHTML += myData;
<p id="output"></p>
so in both codes there is a time taking activity in the getData function. In the first one the CPU is idle and in the second the CPU is busy. Clearly when CPU is busy the JS runtime is not asynchronous.
The main thread of Node is the JS event loop, so all logic interacting with JS is single threaded. This also includes any C++ logic triggered directly via JS.
Generally any long-running tasks should be split off into worker processes. For instance, in your case, you could have a worker process that would queue up calculations, emitting events back to the JS thread when they have completed.
So really, it's a question of how you go about your connected to c++ via addons code.
I'm not going to refer to the specifics of Node.js as I'm not that familiar with the internal architecture and the possibilities it allows (but I understand it supports multiple worker threads, each representing a different event loop)
In general, if you need to process 100 request/s that take 5 seconds solid CPU time, then there's nothing you can do, except ensuring that you have 500 processors available.
If 100 request/s is peak, while on average it will be much lower, then the solution is queueing, and you use the queue to absorb the blow.
Now things start to get interesting when it is not 5 seconds solid CPU time, but 0.1 CPU time and 4.9 waiting or anything in between. This is the case where asynchronous processing should be used to put all that waiting time to work.
Asynchronous in this case means that:
All your execution happens in an event loop.
You don't wait, no sleep, no blocking I/O, just execute or return to the event loop.
You split your task into non-blocking subtasks, interspeded with (async) events (e.g. with a response) that continue the execution.
You split your system into a number of event processing services, exchanging requests and responses through asynchronous events and collaborating to provide the overall functionality.
What to do if you have a subsystem you cannot turn into an asynchronous service under the principles above?
The answer is to wrap it with queues (to absorb the requests) + multiple threads (allowing execution of some threads hile other threads are waiting), providing the async events request/response interface expected by rest of the subsystems.
In all cases it is best to keep a bounded number of threads (instead of a per-request thread model) and always keep the total number of active/hot threads in the system below the number of processing resources.
Node.js is nice in that its input/output is inherently asynchronously and all the infrastructure is geared towards implementing the kind of things I described above.

I have created a C node.js addon with the help of libUV to make the addon asynchronous.
I have made several queues for this.
The code is like this, loopArray is used for storing those queues:
//... variables declarations
void AsyncWork(uv_work_t* req) {
// ...
void AsyncAfter(uv_work_t* req) {
// ...
Handle<Value> RunCallback(const Arguments& args) {
// ... some preparation work
int loopNumber = (rand() % 10);
int status = uv_queue_work(loopArray[loopNumber], &baton->request, AsyncWork, AsyncAfter);
return Undefined();
extern "C" {
static void Init(Handle<Object> target) {
int i = 0;
for (i = 0; i< 10; i++){
loopArray[i] = uv_loop_new();
target->Set(String::NewSymbol("callback"), FunctionTemplate::New(RunCallback)->GetFunction());
NODE_MODULE(addon, Init)
The problem is that, even I created 10 queues for the CPU-demanding tasks. node.js does not switch between tasks while processing one of the queue. Is it due to the single-thread nature of node.js?
Is so, does uv_thread_create helps the situtation?
I cannot find any code sample for this, so I am not sure how to use it.
That is the main idea behind node's architecture: Using function call(back)s and a main event loop to run them instead of using threads to process multiple jobs in parallel.
If what you want to do is to process a queue of jobs, the best way to do it is doing one job at a time. Utilizing multiple cpu cores on a system is done by multiple node instances instead of threads. We have child_process and cluster node modules for this.
When you create multiple threads, let's say you want to run 10 threads for your work, if your system has 8 cpu cores, you are killing the performance by giving unnecessary work to operating system's scheduler. This is a very important point you should take into account. If you have 8 cores, you should not create more than 8 threads in parallel if you want the maximum performance.
For node, we don't try to create multiple queues or threads in one process. Instead, we employ multiple node processes, again maximum one process per core.
If you are going to process a queue which is already there. In this kind of work, you do not need your C module to be asynchronous.
We want asynchronous behavior when we have jobs coming from outside like http requests on a web server. On a web server, our job comes in a way that we cannot control. People and other machines connect to our server whenever they want and we want to answer each of them as quickly as possible. For this, we do not want any request to block others. We need to handle as many requests as we can in parallel.
If you are running on rows of a database table or making some calculations over a long list of parameters however, you are in a very different kind of business. You have your job queue in front of you waiting for your way of management. Your jobs are not coming to your system in a way you have no control over. In this kind of business, to reach the ultimate efficiency and hit the topmost profits, you should run jobs one after another without any switching between them. Parallelism is only good when you have multiple cores and to employ them, the best practice for node is to use multiple node processes.

With Node.js, or eventlet or any other non-blocking server, what happens when a given request takes long, does it then block all other requests?
Example, a request comes in, and takes 200ms to compute, this will block other requests since e.g. nodejs uses a single thread.
Meaning your 15K per second will go down substantially because of the actual time it takes to compute the response for a given request.
But this just seems wrong to me, so I'm asking what really happens as I can't imagine that is how things work.
Whether or not it "blocks" is dependent on your definition of "block". Typically block means that your CPU is essentially idle, but the current thread isn't able to do anything with it because it is waiting for I/O or the like. That sort of thing doesn't tend to happen in node.js unless you use the non-recommended synchronous I/O functions. Instead, functions return quickly, and when the I/O task they started complete, your callback gets called and you take it from there. In the interim, other requests can be processed.
If you are doing something computation-heavy in node, nothing else is going to be able to use the CPU until it is done, but for a very different reason: the CPU is actually busy. Typically this is not what people mean when they say "blocking", instead, it's just a long computation.
200ms is a long time for something to take if it doesn't involve I/O and is purely doing computation. That's probably not the sort of thing you should be doing in node, to be honest. A solution more in the spirit of node would be to have that sort of number crunching happen in another (non-javascript) program that is called by node, and that calls your callback when complete. Assuming you have a multi-core machine (or the other program is running on a different machine), node can continue to respond to requests while the other program crunches away.
There are cases where a cluster (as others have mentioned) might help, but I doubt yours is really one of those. Clusters really are made for when you have lots and lots of little requests that together are more than a single core of the CPU can handle, not for the case where you have single requests that take hundreds of milliseconds each.
Everything in node.js runs in parallel internally. However, your own code runs strictly serially. If you sleep for a second in node.js, the server sleeps for a second. It's not suitable for requests that require a lot of computation. I/O is parallel, and your code does I/O through callbacks (so your code is not running while waiting for the I/O).
On most modern platforms, node.js does us threads for I/O. It uses libev, which uses threads where that works best on the platform.
You are exactly correct. Nodejs developers must be aware of that or their applications will be completely non-performant, if long running code is not asynchronous.
Everything that is going to take a 'long time' needs to be done asynchronously.
This is basically true, at least if you don't use the new cluster feature that balances incoming connections between multiple, automatically spawned workers. However, if you do use it, most other requests will still complete quickly.
Edit: Workers are processes.
You can think of the event loop as 10 people waiting in line to pay their bills. If somebody is taking too much time to pay his bill (thus blocking the event loop), the other people will just have to hang around waiting for their turn to come.. and waiting...
In other words:
Since the event loop is running on a single thread, it is very
important that we do not block it’s execution by doing heavy
computations in callback functions or synchronous I/O. Going over a
large collection of values/objects or performing time-consuming
computations in a callback function prevents the event loop from
further processing other events in the queue.
Here is some code to actually see the blocking / non-blocking in action:
With this example (long CPU-computing task, non I/O):
var net = require('net');
handler = function(req, res) {
for (i = 0; i < 10000000000; i++) { a = i + 5; }
if you do 2 requests in the browser, only a single hello will be displayed in the server console, meaning that the second request cannot be processed because the first one blocks the Node.js thread.
If we do an I/O task instead (write 2 GB of data on disk, it took a few seconds during my test, even on a SSD):
http = require('http');
fs = require('fs');
buffer = Buffer.alloc(2*1000*1000*1000);
first = true;
done = false;
write = function() {
fs.writeFile('big.bin', buffer, function() { done = true; });
handler = function(req, res) {
if (first) {
first = false;
res.end('Starting write..')
if (done) {
res.end("write done.");
} else {
res.end('writing ongoing.');
here we can see that the a-few-second-long-IO-writing-task write is non-blocking: if you do other requests in the meantime, you will see writing ongoing.! This confirms the well-known non-blocking-for-IO features of Node.js.
