Async callback blocks NodeJS - node.js

I have a server-client based NODE.JS application.
server.js
...
socket.on('message', function(message) {
if(message.code == 103)
{
process_some_data()
}
else
{
console.log("UNKNOWN MESSAGE");
}
});
...
client.js
.. sending responses back to server.js
the process_some_data() function takes about 4 seconds to complete, and when i have just one client it is not a problem, but if i have 10, they all choke and wait till the the last finishes.
I found out that the entire socket event waits till he finishes the current job, for example if i comment process_some_data(), it will not be frozen
I have tried 2 tweaks but the didn't worked :
...
socket.on('message', function(message) {
if(message.code == 103)
{
setTimeout(function() {
process_some_data();
console.log("FINISH");
}, 1)
}
else
{
console.log("UNKNOWN MESSAGE");
}
});
...
And even used http://caolan.github.io/async/ ,but no use :
...
socket.on('message', function(message) {
if(message.code == 103)
{
// Array to hold async tasks
var asyncTasks = [];
async.series([
setTimeout(function() {
process_some_data();
console.log("FINISH");
}, 1)
], function (err, results) {
console.log(results);
});
}
else
{
console.log("UNKNOWN MESSAGE");
}
});
...
How to make this ASYNC? Really need this.
Thank you.

You need multiple processes to solve this with Javascript, because Javascript engines are single-threaded.
What?
When it comes to handling I/O events, such as reading a socket, writing to a file or waiting for a signal, Javascript engines give the appearance of doing multiple things at the same time.
They are actually not: it's just that, under most conditions, processing these events takes so little computation, and the events themselves occur with so much time in between (a microsecond is an eternity for a CPU), that the engine can just process them one after another with plenty of time to spare.
In human time-scale, it looks like the engine is doing a lot of stuff in parallel, but it's just working serially at great speed.
No matter how you schedule your code to run, using setTimeout or Promise, it will still block other events from being processed during the time it's actively computing. Long-running computations (in the scale of seconds, instead of milliseconds) expose the single-threaded nature of the engine: it cannot actually do multiple things at the same time.
Multiple processes
Your computer, however, probably has multiple CPU cores. Unlike the Javascript engine, your hardware is capable of tackling multiple tasks at the same time, at least 1 per core. Even with a single core, your operating system can solve the problem if you run multiple processes.
Since a single Javascript process is single-threaded, you need multiple Javascript processes for this. An easy and time-proven architecture to solve your problem is this:
One Javascript program, running in one process, reads from the socket. Instead of calling process_some_data(), however, it puts all incoming messages in a queue.
This program then sends items from the queue to another Javascript program, running in a different process, that performs the computation using another CPU core. There may be multiple copies of this second process. In a modern computer, it makes sense to have twice as many active processes as you have CPU cores.
A simple approach for Node is to write an HTTP server, using express, that runs the computationally-intensive task. The main program can then use HTTP to delegate tasks to the workers, while still being able to read from the socket.
This is a good article on the topic of multi-processing with Node, using the cluster API.

Related

How do I make function to running on background?

I have this code periodically calls the load function which does very load work taking 10sec. Problem is when load function is being executed, it's blocking the main flow. If I send a simple GET request (like a health check) while load is being executed, the GET call is blocked until the load call is finished.
function setLoadInterval() {
var self = this;
this.interval = setInterval(function doHeavyWork() {
// this takes 10 sec
self.load();
self.emit('reloaded');
}, 20000);
I tried async.parallel but still the GET call was blocked. I tried setTimeout but got the same result. How do I make load to running on background so that it doesn't block the main flow?
this.interval = setInterval(function doHeavyWork() {
async.parallel([function(cb) {
self.load();
cb(null);
}], function(err) {
if (err) {
// log error
}
self.emit('reloaded');
})
}, 20000);
Node.js is an event driven non-blocking IO model
Anything that is IO is offloaded as a separate thread in the underlying engine and hence parallelism is achieved.
If the task is CPU intensive there is no way you can achieve parallelism as by default Javascript is a blocking sync language
However there are some ways to achieve this by offloading the CPU intensive task to a different process.
Option1:
exec or spawn a child process and execute the load() function in that spawned node app. This is okay if the interval fired is for every 20000 ms as by the time another one fired, the 10sec process will be completed.
Otherwise it is dangerous as it can spawn too many node applications eating up your Systems resources
Option2:
I dont know how much data self.load() accepts and returns. If it is trivial and network overhead is acceptable, make that task a load-balanced web service (may be 4 webservers running in parallel) which accepts (rather point to) 1M records and returns back filtered records.
NOTE
It looks like you are using node async parallel function. But keep a note of this description from the documentation.
Note: parallel is about kicking-off I/O tasks in parallel, not about parallel execution of code. If your tasks do not use any timers or perform any I/O, they will actually be executed in series. Any synchronous setup sections for each task will happen one after the other. JavaScript remains single-threaded.

Calling Express get commands concurrently

so currently I have this function
app.get('/requestItems', function(req, res){
//Check if Bots are online
if(botQueue.length == 0){
//If there are no bots in the queue to take the order, then we can't process it.
console.log("Sorry no bots available");
res.send("No Bots available ATM");
return;
} else {
var currentBot = botQueue.shift();
}
requestItems(currentBot.offerInstance, steamIDtoTrade, itemID, userAccessToken);
eventEmitter.on('requestOfferExpired', function(){
console.log("Request offer has timed out/been cancelled");
res.send("Request offer has timed out/been cancelled");
botQueue.push(currentBot);
});
eventEmitter.on('requestOfferAccepted', function(){
console.log("Request offer has completed");
res.send("Request offer has completed");
botQueue.push(currentBot);
});
});
When I call it, It takes about 5 minutes to run. While its running, I can't seem to make requests to the URL. I know node is a single threaded, but is there a way to run it parrallelly/concurrently? Or do I simply need to switch up my design strategy?
EDIT: requestItems function : http://pastebin.com/Eif5CeEv
If the 5 minutes per request is 100% CPU of node.js running with no IO, then you will not get any other requests processed during that time and you will need to run multiple node processes (probably in a cluster) in order to process multiple requests during that time.
If, on the other hand, much of that 5 minutes is doing IO (disk, database or socket), then a proper async IO design will allow your node server to process many requests during that time.
Since it seems unlikely that you're number crunching the CPU for 5 minutes in a row with no IO, my guess is that you need to switch to a proper async IO design and then you can use the advantages of node.js and can have many different requests processing at the same time.
You will need to disclose a lot more about what is going on in that 5 minutes for us to help more specifically. Some questions about those 5 minutes:
What types of operations are taking the 5 minutes of time?
Are you using node.js async disk IO and async database IO in a way that allows other node.js events to flow while a given request is waiting for IO?
Is your request handling strategy designed to have multiple requests being processed at the same time?
Can you show us pieces of the code that are taking 5 minutes?

Parallel Request at different paths in NodeJS: long running path 1 is blocking other paths

I am trying out simple NodeJS app so that I could to understand the async nature.
But my problem is as soon as I hit "/home" from browser it waits for response and simultaneously when "/" is hit, it waits for the "/home" 's response first and then responds to "/" request.
My concern is that if one of the request needs heavy processing, in parallel we can't request another one? Is this correct?
app.get("/", function(request, response) {
console.log("/ invoked");
response.writeHead(200, {'Content-Type' : 'text/plain'});
response.write('Logged in! Welcome!');
response.end();
});
app.get("/home", function(request, response) {
console.log("/home invoked");
var obj = {
"fname" : "Dead",
"lname" : "Pool"
}
for (var i = 0; i < 999999999; i++) {
for (var i = 0; i < 2; i++) {
// BS
};
};
response.writeHead(200, {'Content-Type' : 'application/json'});
response.write(JSON.stringify(obj));
response.end();
});
Good question,
Now, although Node.js has it's asynchronous nature, this piece of code:
for (var i = 0; i < 999999999; i++) {
for (var i = 0; i < 2; i++) {
// BS
};
};
Is not asynchronous actually blocking the node main thread. And therefore, all other requests has to wait until this big for loop will end.
In order to do some heavy calculations in parallel I recommend using setTimeout or setInterval to achieve your goal:
var i=0;
var interval = setInterval(function() {
if(i++>=999999999){
clearInterval(interval);
}
//do stuff here
},5);
For more information I recommend searching for "Node.js event loop"
As Stasel, stated, code running like will block the event loop. Basically whenever javascript is running on the server, nothing else is running. Asynchronous I/O events such as disk I/O might be processing in the background, but their handler/callback won't be call unless your synchronous code has finished running. Basically as soon as it's finished, node will check for pending events to be handled and call their handlers respectively.
You actually have couple of choices to fix this problem.
Break the work in pieces and let the pending events be executed in between. This is almost same as Stasel's recommendation, except 5ms between a single iteration is huge. For something like 999999999 items, that takes forever. Firstly I suggest batch process the loop for about sometime, then schedule next batch process with setimmediate. setimmediate basically will schedule it after the pending I/O events are handled, so if there is not new I/O event to be handled(like no new http requests) then it will executed immediately. It's fast enough. Now the question comes that how much processing should we do for each batch/iteration. I suggest first measure how much does it on average manually, and for schedule about 50ms of work. For example if you have realized 1000 items take 100ms. Then let it process 500 items, so it will be 50ms. You can break it down further, but the more broken down, the more time it takes in total. So be careful. Also since you are processing huge amount of items, try not to make too much garbage, so the garbage collector won't block it much. In this not-so-similar question, I've explained how to insert 10000 documents into MongoDB without blocking the event loop.
Use threads. There are actually a couple nice thread implementations that you won't shoot yourself in foot with them. This is really a good idea for this case, if you are looking for performance for huge processings, since it would be tricky as I said above to implement CPU bound task playing nice with other stuff happening in the same process, asynchronous events are perfect for data-bound task not CPU bound tasks. There's nodejs-threads-a-gogo module you can use. You can also use node-webworker-threads which is built on threads-a-gogo, but with webworker API. There's also nPool, which is a bit more nice looking but less popular. They all support thread pools and should be straight forward to implement a work queue.
Make several processes instead of threads. This might be slower than threads, but for huge stuff still way better than iterating in the main process. There's are different ways. Using processes will bring you a design that you can extend it to using multiple machines instead of just using multiple CPUs. You can either use a job-queue(basically pull the next from the queue whenever finished a task to process), a multi process map-reduce or AWS elastic map reduce, or using nodejs cluster module. Using cluster module you can listen to unix domain socket on each worker and for each job just make a request to that socket. Whenever the worker finished processing the job, it will just write back to that particular request. You can search about this stuff, there are many implementations and modules existing already. You can use 0MQ, rabbitMQ, node built-in ipc, unix domain sockets or a redis queue for multi process communications.

NodeJs how to create a non-blocking computation

I am trying to get my head around creating a non-blocking piece of heavy computation in nodejs. Take this example (stripped out of other stuff):
http.createServer(function(req, res) {
console.log(req.url);
sleep(10000);
res.end('Hello World');
}).listen(8080, function() { console.log("ready"); });
As you can imagine, if I open 2 browser windows at the same time, the first will wait 10 seconds and the other will wait 20, as expected. So, armed with the knowledge that a callback is somehow asynchronous I removed the sleep and put this instead:
doHeavyStuff(function() {
res.end('Hello World');
});
with the function simply defined:
function doHeavyStuff(callback) {
sleep(10000);
callback();
}
that of course does not work... I have also tried to define an EventEmitter and register to it, but the main function of the Emitter has the sleep inside before emitting 'done', for example, so again everything will run block.
I am wondering here how other people wrote non-blocking code... for example the mongojs module, or the child_process.exec are non blocking, which means that somewhere down in the code either they fork a process on another thread and listen to its events. How can I replicate this in a metod that for example has a long process going?
Am I completely misunderstanding the nodejs paradigm? :/
Thanks!
Update: solution (sort of)
Thanks for the answer to Linus, indeed the only way is to spawn a child process, like for example another node script:
http.createServer(function(req, res) {
console.log(req.url);
var child = exec('node calculate.js', function (err, strout, strerr) {
console.log("fatto");
res.end(strout);
});
}).listen(8080, function() { console.log("ready"); });
The calculate.js can take its time to do what it needs and return. In this way, multiple requests will be run in parallel so to speak.
You can't do that directly, without using some of the IO modules in node (such as fs or net). If you need to do a long-running computation, I suggest you do that in a child process (e.g. child_process.fork) or with a queue.
We (Microsoft) just released napajs that can work with Node.js to enable multithreading JavaScript scenarios in the same process.
your code will then look like:
var napa = require('napajs');
// One-time setup.
// You can change number of workers per your requirement.
var zone = napa.zone.create('request-worker-pool', { workers: 4 });
http.createServer(function(req, res) {
console.log(req.url);
zone.execute((request) => {
var result = null;
// Do heavy computation to get result from request
// ...
return result;
}, [req]).then((result) => {
res.end(result.value);
}
}).listen(8080, function() { console.log("ready"); });
You can read this post for more details.
This is a classic misunderstanding of how the event loop is working.
This isn't something that is unique to node - if you have a long running computation in a browser, it will also block. The way to do this is to break the computation up into small chunks that yield execution to the event loop, allowing the JS environment to interleave with other competing calls, but there is only ever one thing happening at one time.
The setImmediate demo may be instructive, which you can find here.
If you computation can be split into chunks, you could schedule executor to poll for data every N seconds then after M seconds run again. Or spawn dedicated child for that task alone, so that the main thread wouldn't block.
Although this is an old post(8 years ago), try to add some new updates to it.
For Nodejs application to get good performance, the first priority is never blocking the event loop. The sleep(10000) method breaks this rule. This is also the reason why Node.js is not suitable for the CPU intensive application. Since the big CPU computation occurs on the event loop thread(it's also the main and single thread of node.js)and will block it.
Multithread programming work_threads was introduced into node.js ecosystem since version 12. Compared with multi-process programming, it's lightweight and has less overhead.
Although multithread was introduced into node.js, but Node.js is still based on the event driven model and async non-block IO. That's node.js's DNA.

What happens when a single request takes a long time with these non-blocking I/O servers?

With Node.js, or eventlet or any other non-blocking server, what happens when a given request takes long, does it then block all other requests?
Example, a request comes in, and takes 200ms to compute, this will block other requests since e.g. nodejs uses a single thread.
Meaning your 15K per second will go down substantially because of the actual time it takes to compute the response for a given request.
But this just seems wrong to me, so I'm asking what really happens as I can't imagine that is how things work.
Whether or not it "blocks" is dependent on your definition of "block". Typically block means that your CPU is essentially idle, but the current thread isn't able to do anything with it because it is waiting for I/O or the like. That sort of thing doesn't tend to happen in node.js unless you use the non-recommended synchronous I/O functions. Instead, functions return quickly, and when the I/O task they started complete, your callback gets called and you take it from there. In the interim, other requests can be processed.
If you are doing something computation-heavy in node, nothing else is going to be able to use the CPU until it is done, but for a very different reason: the CPU is actually busy. Typically this is not what people mean when they say "blocking", instead, it's just a long computation.
200ms is a long time for something to take if it doesn't involve I/O and is purely doing computation. That's probably not the sort of thing you should be doing in node, to be honest. A solution more in the spirit of node would be to have that sort of number crunching happen in another (non-javascript) program that is called by node, and that calls your callback when complete. Assuming you have a multi-core machine (or the other program is running on a different machine), node can continue to respond to requests while the other program crunches away.
There are cases where a cluster (as others have mentioned) might help, but I doubt yours is really one of those. Clusters really are made for when you have lots and lots of little requests that together are more than a single core of the CPU can handle, not for the case where you have single requests that take hundreds of milliseconds each.
Everything in node.js runs in parallel internally. However, your own code runs strictly serially. If you sleep for a second in node.js, the server sleeps for a second. It's not suitable for requests that require a lot of computation. I/O is parallel, and your code does I/O through callbacks (so your code is not running while waiting for the I/O).
On most modern platforms, node.js does us threads for I/O. It uses libev, which uses threads where that works best on the platform.
You are exactly correct. Nodejs developers must be aware of that or their applications will be completely non-performant, if long running code is not asynchronous.
Everything that is going to take a 'long time' needs to be done asynchronously.
This is basically true, at least if you don't use the new cluster feature that balances incoming connections between multiple, automatically spawned workers. However, if you do use it, most other requests will still complete quickly.
Edit: Workers are processes.
You can think of the event loop as 10 people waiting in line to pay their bills. If somebody is taking too much time to pay his bill (thus blocking the event loop), the other people will just have to hang around waiting for their turn to come.. and waiting...
In other words:
Since the event loop is running on a single thread, it is very
important that we do not block it’s execution by doing heavy
computations in callback functions or synchronous I/O. Going over a
large collection of values/objects or performing time-consuming
computations in a callback function prevents the event loop from
further processing other events in the queue.
Here is some code to actually see the blocking / non-blocking in action:
With this example (long CPU-computing task, non I/O):
var net = require('net');
handler = function(req, res) {
console.log('hello');
for (i = 0; i < 10000000000; i++) { a = i + 5; }
}
net.createServer(handler).listen(80);
if you do 2 requests in the browser, only a single hello will be displayed in the server console, meaning that the second request cannot be processed because the first one blocks the Node.js thread.
If we do an I/O task instead (write 2 GB of data on disk, it took a few seconds during my test, even on a SSD):
http = require('http');
fs = require('fs');
buffer = Buffer.alloc(2*1000*1000*1000);
first = true;
done = false;
write = function() {
fs.writeFile('big.bin', buffer, function() { done = true; });
}
handler = function(req, res) {
if (first) {
first = false;
res.end('Starting write..')
write();
return;
}
if (done) {
res.end("write done.");
} else {
res.end('writing ongoing.');
}
}
http.createServer(handler).listen(80);
here we can see that the a-few-second-long-IO-writing-task write is non-blocking: if you do other requests in the meantime, you will see writing ongoing.! This confirms the well-known non-blocking-for-IO features of Node.js.

Resources