OK, so I am creating a node/mongo based CMS. I have the basic framework down. I am wanting to add live stats to the admin panel and I was wondering if I could do the following:
I create slave nodes from the master node using the native Cluster call (v0.6).
function start_workers(num_workers){
for (var i = 0; i < num_workers; i++) {
exports.workers[i] = cluster.fork();
console.log('Worker: '+exports.workers[i].pid+' Is Online');
}
}
Which spawns X number of nodes. I was wondering can I get stats on each worker I spawn?
for instance:
for(var x in workers){
console.log(workers[x].process.cwd());
}
Thanks a lot for your time!
Related
we're diving deeper in Node.js architecture, to achieve fully understanding, how to scale our application.
Clear solution is cluster usage https://nodejs.org/api/cluster.html. Everything seems to be fine, apart of workers management description:
Node.js does not automatically manage the number of workers for you, however. It is your responsibility to manage the worker pool for your application's needs.
I was searching, how to really manage the workers, but most solutions, says:
Start so many workers as you've got cores.
But I would like to dynamically scale up or down my workers count, depending on current load on server. So if there is load on server and queue is getting longer, I would like to start next worker. In another way, when there isn't so much load, I would like to shut down workers (and leave f.e. minimum 2 of them).
The ideal place, will be for me Master Process queue, and event when new Request is coming to Master Process. On this place we can decide if we need next worker.
Do you have any solution or experience with managing workers from Master Thread in Cluster? Starting and killing them dynamically?
Regards,
Radek
following code will help you to understand to create cluster on request basis.
this program will genrate new cluster in every 10 request.
Note: you need to open http://localhost:8000/ and refresh the page for increasing request.
var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;
var numReqs = 0;
var initialRequest = 10;
var maxcluster = 10;
var totalcluster = 2;
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < 2; i++) {
var worker = cluster.fork();
console.log('cluster master');
worker.on('message', function(msg) {
if (msg.cmd && msg.cmd == 'notifyRequest') {
numReqs++;
}
});
}
setInterval(function() {
console.log("numReqs =", numReqs);
isNeedWorker(numReqs) && cluster.fork();
}, 1000);
} else {
console.log('cluster one initilize');
// Worker processes have a http server.
http.Server(function(req, res) {
res.writeHead(200);
res.end("hello world\n");
// Send message to master process
process.send({ cmd: 'notifyRequest' });
}).listen(8000);
}
function isNeedWorker(numReqs) {
if( numReqs >= initialRequest && totalcluster < numCPUs ) {
initialRequest = initialRequest + 10;
totalcluster = totalcluster + 1;
return true;
} else {
return false;
}
}
To manually manage your workers, you need a messaging layer to facilitate inter process communication. With IPC master and worker can communicate effectively, by default and architecture stand point this behavior is already implemented in the process module native. However i find the native implementation not flexible or robust enough to handle horizontal scaling due to network requests.
One obvious solution Redis as a message broker to facilitate this method of master and slave communication. However this solution also as its faults , which is context latency, directly linked to command and reply.
Further research led me to RabbitMQ,great fit for distributing time-consuming tasks among multiple workers.The main idea behind Work Queues (aka: Task Queues) is to avoid doing a resource-intensive task immediately and having to wait for it to complete. Instead we schedule the task to be done later. We encapsulate a task as a message and send it to the queue. A worker process running in the background will pop the tasks and eventually execute the job. When you run many workers the tasks will be shared between them.
To implement a robust server , read this link , it may give some insights. Link
So I am new to cluster and mongodb and i came across this bunch of code.
#!/usr/bin/env node
var cluster = require('cluster');
var os = require('os');
var app = require('../main')
var models = require("../models");
if(cluster.isMaster){
var mCoreCount = os.cpus().length;
console.log("Cores : ", mCoreCount);
for (var i = 0; i < mCoreCount; i++) {
cluster.fork();
}
cluster.on('exit', function(){
cluster.fork();
});
}else{
models.sequelize.sync().then(function(){
app.listen(app.get('port'), function(){
console.log('api is live.' + app.get('port'));
});
});
}
So when I console I get cores as 4, I tried reading but I could not understand anything , If someone could point me whats going on here It will be a great help.
I understood that the greater the number of cores the node instances will increase , but I guess right now its picking up from my system, what happens in production ?.
This script is trying to get the more efficient way to launch the NodeJS app by creating a fork for each available core on the hardware server.
It picks up the number of core with os.cpus().length
In production, it will append the same process, and the number of fork will depend of the number of available core production server.
Are you really sure the database is MongoDB in both environement ? We can't really tell without seeing the whole app code.
Code sample :
var cpuCoreLength = require("os").cpus().length;
var cluster = require("cluster");
if(cluster.isMaster){
cluster.setupMaster({
exec : "child.js"
});
var JOB_QUEUE=["data1","data2","data3"]; //for this post, value inserted manually.
for(var i=0;i<cpuCoreLength;i++){
var worker = cluster.fork();
worker.on("message",function(data){
if(data.type == "SEND_DATA"){
worker.send({"data":JOB_QUEUE[0]});
}
});
}
}
else
{
process.send("type":"SEND_DATA"});
process.on("message",function(data){
//Do the Process
});
}
Hi this is sample code for my process.In the above code JOB_QUEUE is data that to be processed by worker. when new worker created one data will be pushed to worker. Worker will use common file specified in the Master Setting. This is one copy of Master. I am going to create like 5 Master Code with different JOB_QUEUE but same child file. bcoz every master node going to handle different set of data and each master to be processing with 1000000 of data. To monitor data processed by worker purpose i created separate Master. My question, will it lead any performance Issue in Terms of CPU or due to CPU core if i run Multiple Master Node?
I'm trying to build nodejs application which will take advantage of multicore machines ( a.k.a. clustering ) and I got a question about sessions. My code looks like this:
var cluster = exports.cluster = require('cluster');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died. Trying to respawn...');
cluster.fork();
});
} else {
//spawn express etc
}
My question is: Everytime a single user hits random node instance or for example the first time he opens the page and hits node N4 and till his session expires, he hit node N4 on every request? For those who didn't understand my question, I will try to explain what I'm worried about:
A user enters my page, he login on node N3, then I set req.session.userdata to a random data, he refreshes the page and he hit Node N4, will I be able to access req.session.userdata from different Node? That mean there is a chance for the user to get randomly logged out or I'm just not understanding how clustering with express works?
You're correct that the in memory session store in Connect/Express is unsuitable for supporting more than one instance. The solution is to implement a session store with a backing database. My recommendation is connect-redis, and example code is at Session Undefined - Using Connect-Redis / ExpressJS / Node
But there are dozens of options.
My team and I are playing around with NodeJS (with jsdom/jQuery) and parsing a lot of HTML documents stored in CouchDB. NodeJS is single threaded so having 8 cores in a serve does not help us at all initially, this is where I was wondering how to best create child processes (workers perhaps?) to process the individual file as it's pulled out from CouchDB?
Here is my thought process:
Main NodeJS script loops through CouchDB view getting the HTML files from documents every X minutes
Spawn a process to parse (jsdom/jQuery) and store the results from each HTML file
We aren't running a webserver at all to handle any of this (all command line) so I am unsure of how to handle this outside of a generic "set up CRON to just run each parsing job seperately". It seems that workers are generally used to process requests coming in from a webserver.
Thoughts?
Use the cluster
var cluster = require("cluster");
var numCPUs = require('os').cpus().length;
var htmlDocs = [...];
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('death', function(worker) {
console.log('worker ' + worker.pid + ' died');
});
} else {
for (var i = process.env.NODE_WORKER_ID; i < htmlDocs.length; i+=numCPUs) {
couch.doWork(htmlDocs[i]);
}
}
This is a classic case of doing work on members in an array and then splitting that work out over multiple processes by having each process do a subset of the array.
Note how we increment i by number of processes. This means worker 1 does 1st, 5th, 9th, etc, worker 2 does 2nd, 6th, 10th, etc.