Node.js Synchronous Library Code Blocking Async Execution - node.js

Suppose you've got a 3rd-party library that's got a synchronous API. Naturally, attempting to use it in an async fashion yields undesirable results in the sense that you get blocked when trying to do multiple things in "parallel".
Are there any common patterns that allow us to use such libraries in an async fashion?
Consider the following example (using the async library from NPM for brevity):
var async = require('async');
function ts() {
return new Date().getTime();
}
var startTs = ts();
process.on('exit', function() {
console.log('Total Time: ~' + (ts() - startTs) + ' ms');
});
// This is a dummy function that simulates some 3rd-party synchronous code.
function vendorSyncCode() {
var future = ts() + 50; // ~50 ms in the future.
while(ts() <= future) {} // Spin to simulate blocking work.
}
// My code that handles the workload and uses `vendorSyncCode`.
function myTaskRunner(task, callback) {
// Do async stuff with `task`...
vendorSyncCode(task);
// Do more async stuff...
callback();
}
// Dummy workload.
var work = (function() {
var result = [];
for(var i = 0; i < 100; ++i) result.push(i);
return result;
})();
// Problem:
// -------
// The following two calls will take roughly the same amount of time to complete.
// In this case, ~6 seconds each.
async.each(work, myTaskRunner, function(err) {});
async.eachLimit(work, 10, myTaskRunner, function(err) {});
// Desired:
// --------
// The latter call with 10 "workers" should complete roughly an order of magnitude
// faster than the former.
Are fork/join or spawning worker processes manually my only options?

Yes, it is your only option.
If you need to use 50ms of cpu time to do something, and need to do it 10 times, then you'll need 500ms of cpu time to do it. If you want it to be done in less than 500ms of wall clock time, you need to use more cpus. That means multiple node instances (or a C++ addon that pushes the work out onto the thread pool). How to get multiple instances depends on your app strucuture, a child that you feed the work to using child_process.send() is one way, running multiple servers with cluster is another. Breaking up your server is another way. Say its an image store application, and mostly is fast to process requests, unless someone asks to convert an image into another format and that's cpu intensive. You could push the image processing portion into a different app, and access it through a REST API, leaving the main app server responsive.
If you aren't concerned that it takes 50ms of cpu to do the request, but instead you are concerned that you can't interleave handling of other requests with the processing of the cpu intensive request, then you could break the work up into small chunks, and schedule the next chunk with setInterval(). That's usually a horrid hack, though. Better to restructure the app.

Related

how to use/implement a custom nodejs Circuit Breaker based on the number of requests?

I'm trying to figure out what is the best wait to implement a circuit breaker based of the number of requests been served in a Typescript/express application instead of fails percentage.
Since the application is meant to be executed by large number of users and under a heavy load, I'm trying to customize the response code in order to trigger a horizontal scaling event with k8s/istio.
The first thing I want to start with is to get is the number of requests in nodejs eventloop event if there is some async work in progress, because a big part of my request are executed asynchronously using async/await.
BTW:
I have seen these Libs
https://github.com/bennadel/Node-Circuit-Breaker
https://github.com/nodeshift/opossum
Is there any good Idea/path I can start with in order to make this possible ?
I can't tell for sure from your question, but if what you're trying to do is to just keep track of how many requests are in progress and then do something in particular if that number exceeds a particular value, then you can use this middleware:
function requestCntr() {
let inProgress = 0;
const events = ['finish', 'end', 'error', 'close'];
return function(req, res, next) {
function done() {
// request finished, so decrement the inProgress counter
--inProgress;
// unhook all our event handlers so we don't count it more than one
events.forEach(event => res.off(event, done));
}
// increment counter for requests in progress
++inProgress;
const maxRequests = 10;
if (inProgress > maxRequests) {
console.log('more than 10 requests in flight at the same time');
// do whatever you want to here
}
events.forEach(event => res.on(event, done));
next();
}
}
app.use(requestCntr());

Node.js Spawning multiple threads within a class method

How can I run a single method multiple times multi-threaded when called as a method of a class?
At first I tried to use the cluster module, but I realize it just re-runs the whole process from the start, rightfully so.
How can I achieve something like what's outlined below?
I want a class's method to spawn n processes, and when the parallel tasks are completed, I can resolve a promise which the method returns.
The problem with the code below is that calling cluster.fork() will fork index.js process.
index.js
const Person = require('./Person.js');
var Mary = new Person('Mary');
Mary.run(5).then(() => {...});
console.log('I should only run once, but I am called 5 times too many');
Person.js
const cluster = require('cluster');
class Person{
run(distance){
var completed = 0;
return new Promise((resolve, reject) => {
for(var i = 0; i < distance; i++) {
// run a separate process for each
cluster.fork().send(i).on('message', message => {
if (message === 'completed') { ++completed; }
if (completed === distance) { resolve(); }
});
}
});
}
}
I think the short answer is impossible. It's even worse - this has nothing to do with js. To multi (process or thread) in your particular problem you will essentially need a copy of the object in every thread, since it needs (maybe) access to fields - in this case you would need to either initialize it in every thread or share memory. That last one I don't think is provided in cluster, and not trivial in other languages in every use case.
If the calculation is independent of the Person I suggest you extract it, and use the usual (in index.js):
if(cluster.isWorker) {
//Use the i for calculation
} else {
//Create Person, then fork children in for loop
}
You then collect the results and change the Person as needed. You will be copying index.js, but this is standard and you only run what you need.
The problem is if results are dependent on Person. If these are constant for all i you can still send them to your forks independently. Otherwise what you have is the only way to fork. In general forking in cluster is not meant for methods, but for the app itself, which is the standard forking behavior.
Another solution
Following your comment, I suggest you checkout child_process.execFile or child_process.exec on same file.
This way you can spawn a totally independent process on the fly. Now instead of calling cluster.fork you call execFile. You can use either the exit code or stdout as return values (stderr etc.). Promise is now replaced with:
var results = []
for(var i = 0; i < distance; i++) {
// run a separate process for each
results.push(child_process.execFile().child.execFile('node', 'mymethod.js`,i]));
}
//... catch the exit event from all results or return a callback using results.
Inside mymethod.js Have your code that takes i and returns what you want either in the exit code or through stdout, both properties of the returned child_process. This is a bit un-node.js-y since you're waiting on asynchronous calls, but you're requirements are non standard. Since I'm not sure how you use this perhaps returning a callback with the array is a better idea.

Can this code cause a race condition in socket io?

I am very new to node js and socket io. Can this code lead to a race condition on counter variable. Should I use a locking library for safely updating the counter variable.
"use strict";
module.exports = function (opts) {
var module = {};
var io = opts.io;
var counter = 0;
io.on('connection', function (socket) {
socket.on("inc", function (msg) {
counter += 1;
});
socket.on("dec" , function (msg) {
counter -= 1;
});
});
return module;
};
No, there is no race condition here. Javascript in node.js is single threaded and event driven so only one socket.io event handler is ever executing at a time. This is one of the nice programming simplifications that come from the single threaded model. It runs a given thread of execution to completion and then and only then does it grab the next event from the event queue and run it.
Hopefully you do realize that the same counter variable is accessed by all socket.io connections. While this isn't a race condition, it means that there's only one counter that all socket.io connections are capable of modifying.
If you wanted a per-connection counter (separeate counter for each connection), then you could define the counter variable inside the io.on('connection', ....) handler.
The race conditions you do have to watch out for in node.js are when you make an async call and then continue the rest of your coding logic in the async callback. While the async operation is underway, other node.js code can run and can change publicly accessible variables you may be using. That is not the case in your counter example, but it does occur with lots of other types of node.js programming.
For example, this could be an issue:
var flag = false;
function doSomething() {
// set flag indicating we are in a fs.readFile() operation
flag = true;
fs.readFile("somefile.txt", function(err, data) {
// do something with data
// clear flag
flag = false;
});
}
In this case, immediately after we call fs.readFile(), we are returning control back to the node.js. It is free at that time to run other operations. If another operation could also run this code, then it will tromp on the value of flag and we'd have a concurrency issue.
So, you have to be aware that anytime you make an async operation and then the rest of your logic continues in the callback for the async operation that other code can run and any shared variables can be accessed at that time. You either need to make a local copy of shared data or you need to provide appropriate protections for shared data.
In this particular case, the flag could be incremented and decremented rather than simply set to true or false and it would probably serve the desired purpose of keeping track of whether this file is current being read or not.
Shorter answer:
"Race condition" is when you execute a series of ordered asynchronous functions and because of their async nature they won't finish processing in their original order.
In your code, you are executing a series of ordered synchronous process (increasing or decreasing the counter), So they finish instantly after they start, resulting in ordered output. So no racing here!

Is making sequential HTTP requests a blocking operation in node?

Note that irrelevant information to my question will be 'quoted'
like so (feel free to skip these).
Problem
I am using node to make in-order HTTP requests on behalf of multiple clients. This way, what originally took the client(s) several different page loads to get the desired result, now only takes a single request via my server. I am currently using the ‘async’ module for flow control and ‘request’ module for making the HTTP requests. There are approximately 5 callbacks which, using console.time, takes about ~2 seconds from start to finish (sketch code included below).
Now I am rather inexperienced with node, but I am aware of the
single-threaded nature of node. While I have read many times that node
isn’t built for CPU-bound tasks, I didn’t really understand what that
meant until now. If I have a correct understanding of what’s going on,
this means that what I currently have (in development) is in no way
going to scale to even more than 10 clients.
Question
Since I am not an expert at node, I ask this question (in the title) to get a confirmation that making several sequential HTTP requests is indeed blocking.
Epilogue
If that is the case, I expect I will ask a different SO question (after doing the appropriate research) discussing various possible solutions, should I choose to continue approaching this problem in node (which itself may not be suitable for what I'm trying to do).
Other closing thoughts
I am truly sorry if this question was not detailed enough, too noobish, or had particularly flowery language (I try to be concise).
Thanks and all the upvotes to anyone who can help me with my problem!
The code I mentioned earlier:
var async = require('async');
var request = require('request');
...
async.waterfall([
function(cb) {
console.time('1');
request(someUrl1, function(err, res, body) {
// load and parse the given web page.
// make a callback with data parsed from the web page
});
},
function(someParameters, cb) {
console.timeEnd('1');
console.time('2');
request({url: someUrl2, method: 'POST', form: {/* data */}}, function(err, res, body) {
// more computation
// make a callback with a session cookie given by the visited url
});
},
function(jar, cb) {
console.timeEnd('2');
console.time('3');
request({url: someUrl3, method: 'GET', jar: jar /* cookie from the previous callback */}, function(err, res, body) {
// do more parsing + computation
// make another callback with the results
});
},
function(moreParameters, cb) {
console.timeEnd('3');
console.time('4');
request({url: someUrl4, method: 'POST', jar: jar, form : {/*data*/}}, function(err, res, body) {
// make final callback after some more computation.
//This part takes about ~1s to complete
});
}
], function (err, result) {
console.timeEnd('4'); //
res.status(200).send();
});
Normally, I/O in node.js are non-blocking. You can test this out by making several requests simultaneously to your server. For example, if each request takes 1 second to process, a blocking server would take 2 seconds to process 2 simultaneous requests but a non-blocking server would take just a bit more than 1 second to process both requests.
However, you can deliberately make requests blocking by using the sync-request module instead of request. Obviously, that's not recommended for servers.
Here's a bit of code to demonstrate the difference between blocking and non-blocking I/O:
var req = require('request');
var sync = require('sync-request');
// Load example.com N times (yes, it's a real website):
var N = 10;
console.log('BLOCKING test ==========');
var start = new Date().valueOf();
for (var i=0;i<N;i++) {
var res = sync('GET','http://www.example.com')
console.log('Downloaded ' + res.getBody().length + ' bytes');
}
var end = new Date().valueOf();
console.log('Total time: ' + (end-start) + 'ms');
console.log('NON-BLOCKING test ======');
var loaded = 0;
var start = new Date().valueOf();
for (var i=0;i<N;i++) {
req('http://www.example.com',function( err, response, body ) {
loaded++;
console.log('Downloaded ' + body.length + ' bytes');
if (loaded == N) {
var end = new Date().valueOf();
console.log('Total time: ' + (end-start) + 'ms');
}
})
}
Running the code above you'll see the non-blocking test takes roughly the same amount of time to process all requests as it does for a single request (for example, if you set N = 10, the non-blocking code executes 10 times faster than the blocking code). This clearly illustrates that the requests are non-blocking.
Additional answer:
You also mentioned that you're worried about your process being CPU intensive. But in your code, you're not benchmarking CPU utility. You're mixing both network request time (I/O, which we know is non-blocking) and CPU process time. To measure how much time the request is in blocking mode, change your code to this:
async.waterfall([
function(cb) {
request(someUrl1, function(err, res, body) {
console.time('1');
// load and parse the given web page.
console.timeEnd('1');
// make a callback with data parsed from the web page
});
},
function(someParameters, cb) {
request({url: someUrl2, method: 'POST', form: {/* data */}}, function(err, res, body) {
console.time('2');
// more computation
console.timeEnd('2');
// make a callback with a session cookie given by the visited url
});
},
function(jar, cb) {
request({url: someUrl3, method: 'GET', jar: jar /* cookie from the previous callback */}, function(err, res, body) {
console.time('3');
// do more parsing + computation
console.timeEnd('3');
// make another callback with the results
});
},
function(moreParameters, cb) {
request({url: someUrl4, method: 'POST', jar: jar, form : {/*data*/}}, function(err, res, body) {
console.time('4');
// some more computation.
console.timeEnd('4');
// make final callback
});
}
], function (err, result) {
res.status(200).send();
});
Your code only blocks in the "more computation" parts. So you can completely ignore any time spent waiting for the other parts to execute. In fact, that's exactly how node can serve multiple requests concurrently. While waiting for the other parts to call the respective callbacks (you mention that it may take up to 1 second) node can execute other javascript code and handle other requests.
Your code is non-blocking because it uses non-blocking I/O with the request() function. This means that node.js is free to service other requests while your series of http requests is being fetched.
What async.waterfall() does it to order your requests to be sequential and pass the results of one on to the next. The requests themselves are non-blocking and async.waterfall() does not change or influence that. The series you have just means that you have multiple non-blocking requests in a row.
What you have is analogous to a series of nested setTimeout() calls. For example, this sequence of code takes 5 seconds to get to the inner callback (like your async.waterfall() takes n seconds to get to the last callback):
setTimeout(function() {
setTimeout(function() {
setTimeout(function() {
setTimeout(function() {
setTimeout(function() {
// it takes 5 seconds to get here
}, 1000);
}, 1000);
}, 1000);
}, 1000);
}, 1000);
But, this uses basically zero CPU because it's just 5 consecutive asynchronous operations. The actual node.js process is involved for probably no more than 1ms to schedule the next setTimeout() and then the node.js process literally could be doing lots of other things until the system posts an event to fire the next timer.
You can read more about how the node.js event queue works in these references:
Run Arbitrary Code While Waiting For Callback in Node?
blocking code in non-blocking http server
Hidden threads in Javascript/Node that never execute user code: is it possible, and if so could it lead to an arcane possibility for a race condition?
How does JavaScript handle AJAX responses in the background? (written about the browser, but concept is the same)
If I have a correct understanding of what’s going on, this means that
what I currently have (in development) is in no way going to scale to
even more than 10 clients.
This is not a correct understanding. A node.js process can easily have thousands of non-blocking requests in flight at the same time. Your sequentially measured time is only a start to finish time - it has nothing to do with CPU resources or other OS resources consumed (see comments below on non-blocking resource consumption).
I still have concerns about using node for this particular
application then. I'm worried about how it will scale considering that
the work it is doing is not simple I/O but computationally intensive.
I feel as though I should switch to a platform that enables
multi-threading. Does what I'm asking/the concern I'm expressing make
sense? I could just be spitting total BS and have no idea what I'm
talking about.
Non-blocking I/O consumes almost no CPU (only a little when the request is originally sent and then a little when the result arrives back), but while the compmuter is waiting for the remove result, no CPU is consumed at all and no OS thread is consumed. This is one of the reasons that node.js scales well for non-blocking I/O as no resources are used when the computer is waiting for a response from a remove site.
If your processing of the request is computationally intensive (e.g. takes a measurable amount of pure blocking CPU time to process), then yes you would want to explore getting multiple processes involved in running the computations. There are multiple ways to do this. You can use clustering (so you simply have multiple identical node.js processes each working on requests from different clients) with the nodejs clustering module. Or, you can create a work queue of computationally intensive work to do and have a set of child processes that do the computationally intensive work. Or, there are several other options too. This not the type of problem that one needs to switch away from node.js to solve - it can be solved using node.js just fine.
You can use queue to process concurrent http calls in nodeJs
https://www.npmjs.com/package/concurrent-queue
var cq = require('concurrent-queue');
test_queue = cq();
// request action method
testQueue: function(req, res) {
// queuing each request to process sequentially
test_queue(req.user, function (err, user) {
console.log(user.id+' done');
res.json(200, user)
});
},
// Queue will be processed one by one.
test_queue.limit({ concurrency: 1 }).process(function (user, cb) {
console.log(user.id + ' started')
// async calls will go there
setTimeout(function () {
// on callback of async, call cb and return response.
cb(null, user)
}, 1000);
});
Please remember that it needs to implement for sensitive business calls where the resource needs to be accessed or update at a time by one user only.
This will block your I/O and make your users to wait and response time will be slow.
Optimization:
You can make it faster and optimize it by creating resource dependent queue. So that the there is a separate queue for each shared resource and synchronous calls for same resource can only be execute for same resource and for different resources the calls will be executed asynchronously
Let suppose that you want to implement that on the base of current user. So that for the same user http calls can only execute synchronously and for different users the https calls will be asynchronous
testQueue: function(req, res) {
// if queue not exist for current user.
if(! (test_queue.hasOwnProperty(req.user.id)) ){
// initialize queue for current user
test_queue[req.user.id] = cq();
// initialize queue processing for current user
// Queue will be processed one by one.
test_queue[req.user.id].limit({ concurrency: 1 }).process(function (task, cb) {
console.log(task.id + ' started')
// async functionality will go there
setTimeout(function () {
cb(null, task)
}, 1000)
});
}
// queuing each request in user specific queue to process sequentially
test_queue[req.user.id](req.user, function (err, user) {
if(err){
return;
}
res.json(200, user)
console.log(user.id+' done');
});
},
This will be fast and block I/O for only that resource for which you want.

execute a function parallely, while completing execution of rest of the code

I have a code snippet in nodejs like this:
in every 2 sec, foo() will be called.
function foo()
{
while (count < 10)
{
doSometing()
count ++;``
}
}
doSomething()
{
...
}
The limitation is, foo() has no callback.
How to make while loop execute and foo() completes without waiting for dosomething() to complete (call dosomething() and proceed), and dosomething() executes parallely?
I think, what you want is:
function foo()
{
while (count < 10)
{
process.nextTick(doSometing);
count ++;
}
}
process.nextTick will schedule the execution of doSometing on the next tick of the event loop. So, instead of switching immediately to doSometing this code will just schedule the execution and complete foo first.
You may also try setTimeout(doSometing,0) and setImmediate(doSometing). They'll allow I/O calls to occur before doSometing will be executed.
Passing arguments to doSomething
If you want to pass some parameters to doSomething, then it's best to ensure they'll be encapsulated and won't change before doSomething will be executed:
setTimeout(doSometing.bind(null,foo,bar),0);
In this case doSometing will be called with correct arguments even if foo and bar will be changed or deleted. But this won't work in case if foo is an object and you changes one of its properties.
What the alternatives are?
If you want doSomething to be executed in parallel (not just asynchronous, but actually in parallel), then you may be interested in some job-processing solution. I recommend you to look at kickq:
var kickq = require('kickq');
kickq.process('some_job', function (jobItem, data, cb) {
doSomething(data);
cb();
});
// ...
function foo()
{
while (count < 10)
{
kickq.create('some_job', data);
count ++;
}
}
kickq.process will create a separate process for processing your jobs. So, kickq.create will just register the job to be processed.
kickq uses redis to queue jobs and it won't work without it.
Using node.js build-in modules
Another alternative is building your own job-processor using Child Process. The resulting code may look something like this:
var fork = require('child_process').fork,
child = fork(__dirname + '/do-something.js');
// ...
function foo()
{
while (count < 10)
{
child.send(data);
count ++;
}
}
do-something.js here is a separate .js file with doSomething logic:
process.on('message', doSomething);
The actual code may be more complicated.
Things you should be aware of
Node.js is single-threaded, so it executes only one function at a time. It also can't utilize more then one CPU.
Node.js is asynchronous, so it's capable of processing multiple functions at once by switching between them. It's really efficient when dealing with functions with lots of I/O calls, because it's newer blocks. So, when one function waits for the response from DB, another function is executed. But node.js is not a good choice for blocking tasks with heavy CPU utilization.
It's possible to do real parallel calculations in node.js using modules like child_process and cluster. child_process allows you to start a new node.js process. It also creates a communication channel between parent and child processes. Cluster allows you to run a cluster of identical processes. It's really handy when you're dealing with http requests, because cluster can distribute them randomly between workers. So, it's possible to create a cluster of workers processing your data in parallel, though generally node.js is single-threaded.

Resources