Is rotating child processes in node.js/cluster good idea? - node.js

Apache Web Server has a config parameter called MaxRequestsPerChild.
http://httpd.apache.org/docs/2.0/en/mod/mpm_common.html#maxrequestsperchild
"After MaxRequestsPerChild requests, the child process will die."
To avoid crush caused by memory leaks, too many connections, or other unexpected errors, should I do the same thing when using node.js Cluster module?
*I'm using Nginx in front of node.js, not Apache. I mentioned to it so that I could easily explain.
I just implemented it like this:
var maxReqsPerChild = 10; // Small number for debug
var numReqs = 0;
if (cluster.isMaster) {
var numCPUs = require('os').cpus().length;
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('death', function(worker) {
// Fork another when one died
cluster.fork();
});
} else {
http.createServer(function(webReq, webRes) {
// Count up
numReqs++;
// Doing something here
// Kill myself
if (numReqs > maxReqsPerChild) {
process.kill(process.pid); // Or more simply, process.exit() is better?
}
}).listen(1338);
}
This has been working well up until now, but I'm wondering there is more proper way.

MaxRequestsPerChild is good to hide memory leak troubles, but shouldn't be used too often, because it just hides real trouble. First try to avoid the memory leaks.
It shouldn't be used to avoid other issues like too many connections, nor other unexpected errors.
When you do use MaxRequetsPerChild, you shouldn't process.kill neither process.exit,
because that immediately closes all undergoing connections.
Instead, you should server.close, which will wait for all undergoing connections finish, and then fires 'close' event.
var server = http.createServer(...);
server.on( "close", function() {
process.exit(0);
});
server.on( "request", function () {
requestCount += 1;
if ( options.max_requests_per_child && (requestCount >= options.max_requests_per_child) ) {
process.send({ cmd: "set", key: "overMaxRequests", value: 1 });
if ( ! server.isClosed ) {
server.close();
server.isClosed = 1;
}
}
});
see a complete working example here:
https://github.com/mash/node_angel

Related

Am I performance testing correctly? Library "Memored" storage vs. normal RAM storage in Node.js

I am trying to compare the READ performance of a library called Memored to regular old RAM variables in Node.js.
I expected that the data stored with Memored to be at least slightly slower than RAM storage in terms of reading data, but the results show the opposite (read below for my outputs).
I am running this in the terminal of Visual Studio Code on Windows 10. It’s all being done in Typescript, which gets compiled down to JavaScript later and then run with the "node" command.
This is my RAM test:
var normalRAM = {
firstname: 'qwe',
lastname: 'fsa'
}
var s = process.hrtime(); //start timer
console.log(normalRAM); // read from ram
var e = process.hrtime(s) //stop timer
console.log("end0", e[0]); //results in seconds
console.log("end1", e[1]); //results in nanoseconds
This is my Memored test:
// Clustering needed to show Memored in action
if (cluster.isMaster)
{
// Fork workers.
for (let i = 0; i < 1; i++)
{
cluster.fork();
}
}
else
{
var han = {
firstname: 'Han',
lastname: 'Solo'
}
// Store and read
memored.store('character1', han, function ()
{
console.log('Value stored!');
var hrstart = process.hrtime(); // start timer
memored.read('character1', function (err: any, value: any)
{
var hrend = process.hrtime(hrstart) // stop timer
console.log('Read value:', value);
console.log("hrend0", hrend[0]); //results in seconds
console.log("hrend1", hrend[1]); //results in nanoseconds
});
});
}
The results:
The RAM read speeds are around 6500000 nanoseconds.
The Memored read speeds are around 1000000 nanoseconds
Am I testing the speeds incorrectly here? What are the flaws in my methodology? Perhaps my initial assumption is wrong?
I switched the following two lines:
var hrend = process.hrtime(hrstart) // stop timer
console.log('Read value:', value);
To this:
console.log('Read value:', value);
var hrend = process.hrtime(hrstart) // stop timer
Which makes more sense in a real scenario since I would need to read it from RAM like that anyway after the data is returned. The answer to my question is probably "your Memored test is performing faster, because it’s only testing when the data comes back for my callback to use, and not when I actually read it from the 'value' variable".

NodeJS Cluster Error : SIGSEGV

I use nodejs cluster
var cluster = require('cluster');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
for (var i = 0; i < numCPUs; i++){
var worker = cluster.fork();
worker.on('exit', function(code, signal) {
console.log("worker was killed by signal: " + signal);
});
}
}
And sometimes with different time intervals i have a error
worker was killed by signal: SIGSEGV
What this error mean and why she called?
node version v0.11.14-pre, Debian
Don't know exactly the answer but think this could help.
Using phantomJs I was getting sometimes the same error (changing worker for signal). The situation: I was opening a page using phantomJs; when the body was ready I want a callback to be called, then I was closing phantom. grosso modo my code was:
phantom.create(function (ph) {
...
callbackDone(result);
ph.exit();
});
Doing so, the exception was:
signal killed phantomjs: SIGSEGV
At this point I realized it was breaking just for heavy callback tasks. In order words, if the callback function before exit() was a light one: everything was fine; but in different conditions it crashes.
The solution: close the object before the callback:
ph.exit();
callbackDone(result);

Best way to execute parallel processing in Node.js

I'm trying to write a small node application that will search through and parse a large number of files on the file system.
In order to speed up the search, we are attempting to use some sort of map reduce. The plan would be the following simplified scenario:
Web request comes in with a search query
3 processes are started that each get assigned 1000 (different) files
once a process completes, it would 'return' it's results back to the main thread
once all processes complete, the main thread would continue by returning the combined result as a JSON result
The questions I have with this are:
Is this doable in Node?
What is the recommended way of doing it?
I've been fiddling, but come no further then following example using Process:
initiator:
function Worker() {
return child_process.fork("myProcess.js");
}
for(var i = 0; i < require('os').cpus().length; i++){
var process = new Worker();
process.send(workItems.slice(i * itemsPerProcess, (i+1) * itemsPerProcess));
}
myProcess.js
process.on('message', function(msg) {
var valuesToReturn = [];
// Do file reading here
//How would I return valuesToReturn?
process.exit(0);
}
Few sidenotes:
I'm aware the number of processes should be dependent of the number of CPU's on the server
I'm also aware of speed restrictions in a file system. Consider it a proof of concept before we move this to a database or Lucene instance :-)
Should be doable. As a simple example:
// parent.js
var child_process = require('child_process');
var numchild = require('os').cpus().length;
var done = 0;
for (var i = 0; i < numchild; i++){
var child = child_process.fork('./child');
child.send((i + 1) * 1000);
child.on('message', function(message) {
console.log('[parent] received message from child:', message);
done++;
if (done === numchild) {
console.log('[parent] received all results');
...
}
});
}
// child.js
process.on('message', function(message) {
console.log('[child] received message from server:', message);
setTimeout(function() {
process.send({
child : process.pid,
result : message + 1
});
process.disconnect();
}, (0.5 + Math.random()) * 5000);
});
So the parent process spawns an X number of child processes and passes them a message. It also installs an event handler to listen for any messages sent back from the child (with the result, for instance).
The child process waits for messages from the parent, and starts processing (in this case, it just starts a timer with a random timeout to simulate some work being done). Once it's done, it sends the result back to the parent process and uses process.disconnect() to disconnect itself from the parent (basically stopping the child process).
The parent process keeps track of the number of child processes started, and the number of them that have sent back a result. When those numbers are equal, the parent received all results from the child processes so it can combine all results and return the JSON result.
For a distributed problem like this, I've used zmq and it has worked really well. I'll give you a similar problem that I ran into, and attempted to solve via processes (but failed.) and then turned towards zmq.
Using bcrypt, or an expensive hashing algorith, is wise, but it blocks the node process for around 0.5 seconds. We had to offload this to a different server, and as a quick fix, I used essentially exactly what you did. Run a child process and send messages to it and get it to
respond. The only issue we found is for whatever reason our child process would pin an entire core when it was doing absolutely no work.(I still haven't figured out why this happened, we ran a trace and it appeared that epoll was failing on stdout/stdin streams. It would also only happen on our Linux boxes and would work fine on OSX.)
edit:
The pinning of the core was fixed in https://github.com/joyent/libuv/commit/12210fe and was related to https://github.com/joyent/node/issues/5504, so if you run into the issue and you're using centos + kernel v2.6.32: update node, or update your kernel!
Regardless of the issues I had with child_process.fork(), here's a nifty pattern I always use
client:
var child_process = require('child_process');
function FileParser() {
this.__callbackById = [];
this.__callbackIdIncrement = 0;
this.__process = child_process.fork('./child');
this.__process.on('message', this.handleMessage.bind(this));
}
FileParser.prototype.handleMessage = function handleMessage(message) {
var error = message.error;
var result = message.result;
var callbackId = message.callbackId;
var callback = this.__callbackById[callbackId];
if (! callback) {
return;
}
callback(error, result);
delete this.__callbackById[callbackId];
};
FileParser.prototype.parse = function parse(data, callback) {
this.__callbackIdIncrement = (this.__callbackIdIncrement + 1) % 10000000;
this.__callbackById[this.__callbackIdIncrement] = callback;
this.__process.send({
data: data, // optionally you could pass in the path of the file, and open it in the child process.
callbackId: this.__callbackIdIncrement
});
};
module.exports = FileParser;
child process:
process.on('message', function(message) {
var callbackId = message.callbackId;
var data = message.data;
function respond(error, response) {
process.send({
callbackId: callbackId,
error: error,
result: response
});
}
// parse data..
respond(undefined, "computed data");
});
We also need a pattern to synchronize the different processes, when each process finishes its task, it will respond to us, and we'll increment a count for each process that finishes, and then call the callback of the Semaphore when we've hit the count we want.
function Semaphore(wait, callback) {
this.callback = callback;
this.wait = wait;
this.counted = 0;
}
Semaphore.prototype.signal = function signal() {
this.counted++;
if (this.counted >= this.wait) {
this.callback();
}
}
module.exports = Semaphore;
here's a use case that ties all the above patterns together:
var FileParser = require('./FileParser');
var Semaphore = require('./Semaphore');
var arrFileParsers = [];
for(var i = 0; i < require('os').cpus().length; i++){
var fileParser = new FileParser();
arrFileParsers.push(fileParser);
}
function getFiles() {
return ["file", "file"];
}
var arrResults = [];
function onAllFilesParsed() {
console.log('all results completed', JSON.stringify(arrResults));
}
var lock = new Semaphore(arrFileParsers.length, onAllFilesParsed);
arrFileParsers.forEach(function(fileParser) {
var arrFiles = getFiles(); // you need to decide how to split the files into 1k chunks
fileParser.parse(arrFiles, function (error, result) {
arrResults.push(result);
lock.signal();
});
});
Eventually I used http://zguide.zeromq.org/page:all#The-Load-Balancing-Pattern, where the client was using the nodejs zmq client, and the workers/broker were written in C. This allowed us to scale this across multiple machines, instead of just a local machine with sub processes.

how to prevent memory leak in javascript

i stuck into memory leak in js problems.
Javascript:
var index = 0;
function leak() {
console.log(index);
index++;
setTimeout(leak, 0);
}
leak();
here is my test codes, and i use instruments.app to detect memory use of it,
and the memory is going up very fast.
i am doubt that there seems no variables occupy the memory.
why?
any thought is appreciate.
Your code creates a set of closures. This prevents the release of memory. In your example the memory will be released after the completion of all timeouts.
This can be seen (after 100 seconds):
var index = 0;
var timeout;
function leak() {
index++;
timeout = setTimeout(leak, 0);
}
leak();
setTimeout(function() {
clearTimeout(timeout);
}, 100000);
setInterval(function() {
console.log(process.memoryUsage());
}, 2000);

How to gracefully restart a NodeJS server?

Currently, my prod environment for a side project is a git repo, where I pull in some code, manually kill the server with Ctrl-C, and restart it manually.
I realize there are a lot of things wrong with this. For instance, what if a user is still in the middle of doing something important and the process is crunching sensitive data, and I just killed it?!
When I used node v0.4.x there was a nice Cluster module that could restart the server gracefully, when the application is in a quiet state. In v0.6.x the Cluster module is built into node, but it's really, really bare, and doesn't have the graceful restart ability.
Anyone know how I can gracefully restart a nodejs server in v0.6.x?
You can handle POSIX signals in node code.
See in the example code, that will handle SIGINT (Ctrl-C for instance) as a STOP signal for all cluster workers, and SIGUSR2 will just restart all workers
So, issuing kill -SIGUSR2 PID, where PID is node master PID will restart all cluster
module.exports = function(app) {
var cluster = require('cluster');
var numCPUs = require('os').cpus().length;
var workerList = new Array();
var sigkill = false;
if (cluster.isMaster) {
for (var i = 0; i < numCPUs; i++) {
var env = process.env;
var worker = cluster.fork(env);
workerList.push(worker);
}
process.on('SIGUSR2',function(){
console.log("Received SIGUSR2 from system");
console.log("There are " + workerList.length + " workers running");
workerList.forEach(function(worker){
console.log("Sending STOP message to worker PID=" + worker.pid);
worker.send({cmd: "stop"});
});
});
process.on('SIGINT',function(){
sigkill = true;
process.exit();
});
cluster.on('death', function(worker) {
if (sigkill) {
logger.warn("SIGKINT received - not respawning workers");
return;
}
var newWorker = cluster.fork();
console.log('Worker ' + worker.pid + ' died and it will be re-spawned');
removeWorkerFromListByPID(worker.pid);
workerList.push(newWorker);
});
} else {
process.on('message', function(msg) {
if (msg.cmd && msg.cmd == 'stop') {
console.log("Received STOP signal from master");
app.close();
process.exit();
}
});
app.listen(3000);
}
function removeWorkerFromListByPID(pid) {
var counter = -1;
workerList.forEach(function(worker){
++counter;
if (worker.pid === pid) {
workerList.splice(counter, 1);
}
});
}
}
There's a module named Forever.
That can gracefully restart the process. I suppose then you can somehow run several instances with cluster (one on each core) and use Forever to monitor / restart them.
This is just an option I found; I'm open to suggestions!
There's also a module named PM2. It has the ability to stop all processes in a cluster.

Resources